Re: [Htmlparser-user] Doubt Questions.
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-04-07 14:21:32
|
An IMG tag does not contain the text you refer to (above or below the image), which sounds like a caption placed by the HTML authoring software. There is no hard and fast rule that will get that text. Sorry. If you have a number of similar pages, you can use heuristics to create code that will find the text - for that particular class of pages. For example, When you have an IMG tag, either from filtering or examining every node, you can check for text or other tags that may be related to it by looking for the enclosing tag using the getParent() method and examining the children of the parent using getChildren(), to find out the siblings of the IMG tag. Some of these siblings may be the text you want, or perhaps a tag containing the text. You might want to use the FilterBuilder tool to see if you can build a heuristic easily. I don't understand your second question at all. ----- Original Message ---- From: Gaurav Pranay <gau...@gm...> To: htm...@li... Sent: Saturday, April 7, 2007 1:43:38 AM Subject: [Htmlparser-user] Doubt Questions. Hello Sir, Thanks for Quick reply & help. But i have some more doubts related to the Html -Parser. Q:-1)How i can use this parser to get the text associated with the an image ie.the <img tag like bold text above or below the image in a html dump & keep track of the texts around the image in a web-page ?. Q:-2) Do I need to clear the Html page so that i dont get the images of the add in any html page with the help of Html-Cleaner & if yes then how to implement the html-cleaner in the java program?. It will be of immense help to me if i could get some relevent codes related to the above doubts & some information about the relevent classes of Html-Parser through which i can attain the goals of my program. Your good-self is therefore requested to please provide me with some guidelines. Regards Gaurav Pranay ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |