[Htmlparser-developer] Re: [Htmlparser-user] Not all image tags are returned [Not a Bug]
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-04-26 03:43:51
|
Hi Annette, I just figured out what is happening... Sorry for the previous mail - this is not a bug in the parser. You see - the tags which werent getting reported as image tags, were sandwiched between link tags <A HREF="..."><IMG ..></A>. Hence, in your application, you will also need to watch out for link tags, and pick up the images from within should there be any. Now - if this causes you additional headaches, then dont register all the scanners, so the link scanner will not interfere, and you will only get the image tags. In order to prove that this analysis is correct - I added one more test case to HTMLImageScannerTest.java - testImageTagsFromYahooWithAllScannersRegistered() This test case extracts the link and checks that the image is found within. Also no of tags found is verified. You can check out this code from CVS, it might help you if you are interested in getting image tags out of link tags. Correspondingly, there is also testImageTagsFromYahoo() which passes (with only html image scanner registered). Let me know if you need further help. Regards, Somik ----- Original Message ----- From: Doyle, Annette To: htm...@li... Sent: Friday, April 26, 2002 1:32 AM Subject: [Htmlparser-user] Not all image tags are returned Is there any known problem about not all image tags being returned? I did the following code: HTMLParser parser = new HTMLParser(htmlOriginalFileLoc); // Registering all the common scanners parser.registerScanners(); for (Enumeration e = parser.elements();e.hasMoreElements();) { HTMLNode node = (HTMLNode)e.nextElement(); if (node instanceof HTMLImageTag) { System.out.println(); System.out.println(((HTMLImageTag)node).getTagLine()); System.out.println(); file://imageTagsUrl.addElement(((HTMLImageTag)node).getImageLocation()); } } I was testing with another html parser and it found all the image tags. Attached is the source from www.yahoo.com when I ran the code above. |