Re: [Htmlparser-user] Hi All

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Drew,
> I have a doubt...I am trying to extract only the text from the
> html pages..But i just could nto get it..I have seen the
> HTMLStringFilter.java..Bu t I could not add it to the existing
> ones and run..bcoz in that the HTML parser has only one argument
> passed whereas other even have the feedback...and also if it
> shoudl work what feedback do we give...I mean a (T or i or s or l)
> And i guess the jar file does not have code fro extracting the
> text..

Sorry bout that - the web page hasnt been updated for a while. You will need
to create a feedback object. If you dont need feedback from the parser, use
the default one that we've provided in the com.kizna.html.util package.

Try this :

* Below is some sample code to parse Yahoo.com and print only the text
information. This scanning
* will run faster, as there are no scanners registered here.
HTMLParser parser = new HTMLParser("http://www.yahoo.com",new
DefaultHTMLParserFeedback());
// In this example, none of the scanners need to be registered
// as a string node is not a tag to be scanned for.
for (Enumeration e = parser.elements();e.hasMoreElements();) {
    HTMLNode node = (HTMLNode)e.nextElement();
    if (node instanceof HTMLStringNode) {
     HTMLStringNode stringNode = (HTMLStringNode)node;
     System.out.println(stringNode.getText());
    }
}

Let us know if you still face problems.

Regards,
Somik