[Htmlparser-user] Efficient parsing - help needed

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi All,

I am new to html parser. I am working on a project
which needs quite a bit of html parsing. I have an
inputstream and this is what I need to do with it:
1) read the stream and save the entire response to
disk.
2) While reading, parse the contents and extract links
and form elements from it.
3) search the response for a particular string. Also,
suppose that the response is not html, it is plain
text maybe, would it still be possible to search the
response for some string?

What is the most efficient way to do this? I am
looking for an all-in-one-step approach. I went
through the docs but am not sure whether I need to
write custom HtmlRenderers and scanners.

Please help.

Another question is that has HtmlParser been tested
with Unicode content, for example Korean or Chinese
characters? In other words, does HtmlParser support
Unicode? This question is in relation to the point
no.3 mentioned above.

Regards,
Ash.

P.S. A VERY HAPPY NEW YEAR TO EVERYBODY.

________________________________________________________________________
Missed your favourite TV serial last night? Try the new, Yahoo! TV.
       visit http://in.tv.yahoo.com