Re: [Htmlparser-user] Not all image tags are returned [Not a Bug]
Brought to you by:
derrickoswald
|
From: Somik R. <so...@ya...> - 2002-04-26 03:43:54
|
Hi Annette,
I just figured out what is happening...
Sorry for the previous mail - this is not a bug in the parser. You see -
the tags which werent getting reported as image tags, were sandwiched
between link tags <A HREF="..."><IMG ..></A>. Hence, in your application,
you will also need to watch out for link tags, and pick up the images from
within should there be any.
Now - if this causes you additional headaches, then dont register all
the scanners, so the link scanner will not interfere, and you will only get
the image tags.
In order to prove that this analysis is correct - I added one more test
case to HTMLImageScannerTest.java -
testImageTagsFromYahooWithAllScannersRegistered()
This test case extracts the link and checks that the image is found within.
Also no of tags found is verified. You can check out this code from CVS, it
might help you if you are interested in getting image tags out of link tags.
Correspondingly, there is also testImageTagsFromYahoo() which passes (with
only html image scanner registered).
Let me know if you need further help.
Regards,
Somik
----- Original Message -----
From: Doyle, Annette
To: htm...@li...
Sent: Friday, April 26, 2002 1:32 AM
Subject: [Htmlparser-user] Not all image tags are returned
Is there any known problem about not all image tags being returned? I did
the following code:
HTMLParser parser = new
HTMLParser(htmlOriginalFileLoc);
// Registering all the common scanners
parser.registerScanners();
for (Enumeration e =
parser.elements();e.hasMoreElements();) {
HTMLNode node = (HTMLNode)e.nextElement();
if (node instanceof HTMLImageTag)
{
System.out.println();
System.out.println(((HTMLImageTag)node).getTagLine());
System.out.println();
file://imageTagsUrl.addElement(((HTMLImageTag)node).getImageLocation());
}
}
I was testing with another html parser and it found all the image tags.
Attached is the source from www.yahoo.com when I ran the code above.
|