Hi Drew,
> I have a doubt...I am trying to extract only the text from the
> html pages..But i just could nto get it..I have seen the
> HTMLStringFilter.java..Bu t I could not add it to the existing
> ones and run..bcoz in that the HTML parser has only one argument
> passed whereas other even have the feedback...and also if it
> shoudl work what feedback do we give...I mean a (T or i or s or l)
> And i guess the jar file does not have code fro extracting the
> text..
Sorry bout that - the web page hasnt been updated for a while. You will need
to create a feedback object. If you dont need feedback from the parser, use
the default one that we've provided in the com.kizna.html.util package.
Try this :
* Below is some sample code to parse Yahoo.com and print only the text
information. This scanning
* will run faster, as there are no scanners registered here.
HTMLParser parser = new HTMLParser("http://www.yahoo.com",new
DefaultHTMLParserFeedback());
// In this example, none of the scanners need to be registered
// as a string node is not a tag to be scanned for.
for (Enumeration e = parser.elements();e.hasMoreElements();) {
HTMLNode node = (HTMLNode)e.nextElement();
if (node instanceof HTMLStringNode) {
HTMLStringNode stringNode = (HTMLStringNode)node;
System.out.println(stringNode.getText());
}
}
Let us know if you still face problems.
Regards,
Somik
|