[Htmlparser-user] Efficient parsing - help needed
Brought to you by:
derrickoswald
From: <jt...@ya...> - 2003-01-03 06:45:42
|
Hi All, I am new to html parser. I am working on a project which needs quite a bit of html parsing. I have an inputstream and this is what I need to do with it: 1) read the stream and save the entire response to disk. 2) While reading, parse the contents and extract links and form elements from it. 3) search the response for a particular string. Also, suppose that the response is not html, it is plain text maybe, would it still be possible to search the response for some string? What is the most efficient way to do this? I am looking for an all-in-one-step approach. I went through the docs but am not sure whether I need to write custom HtmlRenderers and scanners. Please help. Another question is that has HtmlParser been tested with Unicode content, for example Korean or Chinese characters? In other words, does HtmlParser support Unicode? This question is in relation to the point no.3 mentioned above. Regards, Ash. P.S. A VERY HAPPY NEW YEAR TO EVERYBODY. ________________________________________________________________________ Missed your favourite TV serial last night? Try the new, Yahoo! TV. visit http://in.tv.yahoo.com |