Re: [Htmlparser-user] Efficient parsing - help needed

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hmm.. Check the Sample Programs on the website
http://htmlparser.sourceforge.net. That should help
you understand the parser and get started. The sample
programs are also in the download bundle of the
parser.

> 1) read the stream and save the entire response to
> disk.

You can use toHTML() to do this..
HTMLNode node;
for (HTMLEnumeration e =
parser.elements();e.hasMoreNodes();) {
   node = e.nextHTMLNode();
   writeToDisk(node.toHTML());
}

> 2) While reading, parse the contents and extract
> links
> and form elements from it.

Check sample programs for links. Form elements is much
the same as there is an HTMLFormTag. So all you need
to do is :
if (node instanceof HTMLFormTag) {
   HTMLFormTag formTag = (HTMLFormTag)node;
}

> 3) search the response for a particular string.
> Also,
> suppose that the response is not html, it is plain
> text maybe, would it still be possible to search the
> response for some string?

Sure. There are methods like toPlainTextString(),
which get the string output of a node. Then there is
HTMLStringNode - that represents a pure string node.

> What is the most efficient way to do this? I am
> looking for an all-in-one-step approach. I went
> through the docs but am not sure whether I need to
> write custom HtmlRenderers and scanners.

You dont. Start with the sample programs, and post
here if you need help. 

> Another question is that has HtmlParser been tested
> with Unicode content, for example Korean or Chinese
> characters? In other words, does HtmlParser support
> Unicode? This question is in relation to the point
> no.3 mentioned above.

Yes - I think you should not have a problem. There was
an earlier thread on this.
http://sourceforge.net/mailarchive/message.php?msg_id=2507341

Regards,
Somik

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com