Re: [Htmlparser-user] Storing modified web pages to hard disk
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-07-04 00:34:22
|
Hi Chris, You can try this code : HTMLParser parser =3D new HTMLParser("http://..."); parser.registerScanners(); HTMLNode node; for (Enumeration e =3D parser.elements();e.hasMoreElements();) { node =3D (HTMLNode)e.nextElement(); if (node instanceof HTMLLinkTag) { HTMLLinkTag linkTag =3D (HTMLLinkTag)node; String link =3D linkTag.getLink(); // Now that you have the absolute link, change it the way = you want=20 String modifiedLink =3D modifyLink(link); // Output the link tag =20 } =20 else if (node instanceof HTMLImageTag) { HTMLImageTag imageTag =3D (HTMLImageTag)node; String loc =3DimageTag.getImageLocation(); // Now that you have the absolute link, change it the way = you want String modifiedImageLoc =3D modifyImageLoc(loc); // Output the image tag =20 } else { // This prints the html reconstruction of the node = =20 System.out.println(node.toHTML());=20 } } Note: When you are outputting the link and image tag, you will have to = keep a few things in mind. [1] You will need to run thru the params table inorder to accurately = reconstruct rest of the html. This is easy, the parameters in the tags = are in a hashtable that can be retrieved by HTMLTag.getParsed() (all = tags derive from HTMLTag). [2] When you are outputting the link tag, remember that links can = contain other html elements within them. Getting all the nodes contained = in them is easy - you can get an enumeration of link elements with = HTMLLinkTag.nodeData() [3] You might want to consider a second approach for uniform rendering = of all data - since you have all the source code and are fairly sure how = you want to render it - modify the toHTML methods of HTMLLinkTag and = HTMLImageTag for yourself - to change it the way you want. Then, your = application code becomes : HTMLParser parser =3D new HTMLParser("http://..."); parser.registerScanners(); HTMLNode node; for (Enumeration e =3D parser.elements();e.hasMoreElements();) { node =3D (HTMLNode)e.nextElement(); System.out.println(node.toHTML()); } [4] I am strongly considering that I should allow folks to add static = rendering handlers for the link and image tags, allowing you to change = the default toHTML with your own code, without touching the original = source. But you will have to wait for a later release...=20 Cheers, Somik ----- Original Message -----=20 From: Chris Carey=20 To: htm...@li...=20 Sent: Thursday, July 04, 2002 3:10 AM Subject: [Htmlparser-user] Storing modified web pages to hard disk I was looking for the new and improved way to do the following: a) Read in a page from disk or URL b) Modify every <A href=3D""> in the page c) Output the page to disk or to screen For example, I would just like to modify all of the <A> links or <IMG> links in a particular manner, but leave *most* of the page fairly untouched ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek No, I will not fix your computer. http://thinkgeek.com/sf _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |