Hi Chris,
You can try this code :
HTMLParser parser =3D new HTMLParser("http://...");
parser.registerScanners();
HTMLNode node;
for (Enumeration e =3D parser.elements();e.hasMoreElements();) {
node =3D (HTMLNode)e.nextElement();
if (node instanceof HTMLLinkTag) {
HTMLLinkTag linkTag =3D (HTMLLinkTag)node;
String link =3D linkTag.getLink();
// Now that you have the absolute link, change it the way =
you want=20
String modifiedLink =3D modifyLink(link);
// Output the link tag
=20
} =20
else if (node instanceof HTMLImageTag) {
HTMLImageTag imageTag =3D (HTMLImageTag)node;
String loc =3DimageTag.getImageLocation();
// Now that you have the absolute link, change it the way =
you want
String modifiedImageLoc =3D modifyImageLoc(loc);
// Output the image tag
=20
}
else {
// This prints the html reconstruction of the node =
=20
System.out.println(node.toHTML());=20
}
}
Note: When you are outputting the link and image tag, you will have to =
keep a few things in mind.
[1] You will need to run thru the params table inorder to accurately =
reconstruct rest of the html. This is easy, the parameters in the tags =
are in a hashtable that can be retrieved by HTMLTag.getParsed() (all =
tags derive from HTMLTag).
[2] When you are outputting the link tag, remember that links can =
contain other html elements within them. Getting all the nodes contained =
in them is easy - you can get an enumeration of link elements with =
HTMLLinkTag.nodeData()
[3] You might want to consider a second approach for uniform rendering =
of all data - since you have all the source code and are fairly sure how =
you want to render it - modify the toHTML methods of HTMLLinkTag and =
HTMLImageTag for yourself - to change it the way you want. Then, your =
application code becomes :
HTMLParser parser =3D new HTMLParser("http://...");
parser.registerScanners();
HTMLNode node;
for (Enumeration e =3D parser.elements();e.hasMoreElements();) {
node =3D (HTMLNode)e.nextElement();
System.out.println(node.toHTML());
}
[4] I am strongly considering that I should allow folks to add static =
rendering handlers for the link and image tags, allowing you to change =
the default toHTML with your own code, without touching the original =
source. But you will have to wait for a later release...=20
Cheers,
Somik
----- Original Message -----=20
From: Chris Carey=20
To: htm...@li...=20
Sent: Thursday, July 04, 2002 3:10 AM
Subject: [Htmlparser-user] Storing modified web pages to hard disk
I was looking for the new and improved way to do the following:
a) Read in a page from disk or URL
b) Modify every <A href=3D""> in the page
c) Output the page to disk or to screen
For example, I would just like to modify all of the <A> links or <IMG>
links in a particular manner, but leave *most* of the page fairly
untouched
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
No, I will not fix your computer.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|