Re: [Htmlparser-user] Efficient parsing - help needed
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-01-03 18:08:32
|
Hmm.. Check the Sample Programs on the website http://htmlparser.sourceforge.net. That should help you understand the parser and get started. The sample programs are also in the download bundle of the parser. > 1) read the stream and save the entire response to > disk. You can use toHTML() to do this.. HTMLNode node; for (HTMLEnumeration e = parser.elements();e.hasMoreNodes();) { node = e.nextHTMLNode(); writeToDisk(node.toHTML()); } > 2) While reading, parse the contents and extract > links > and form elements from it. Check sample programs for links. Form elements is much the same as there is an HTMLFormTag. So all you need to do is : if (node instanceof HTMLFormTag) { HTMLFormTag formTag = (HTMLFormTag)node; } > 3) search the response for a particular string. > Also, > suppose that the response is not html, it is plain > text maybe, would it still be possible to search the > response for some string? Sure. There are methods like toPlainTextString(), which get the string output of a node. Then there is HTMLStringNode - that represents a pure string node. > What is the most efficient way to do this? I am > looking for an all-in-one-step approach. I went > through the docs but am not sure whether I need to > write custom HtmlRenderers and scanners. You dont. Start with the sample programs, and post here if you need help. > Another question is that has HtmlParser been tested > with Unicode content, for example Korean or Chinese > characters? In other words, does HtmlParser support > Unicode? This question is in relation to the point > no.3 mentioned above. Yes - I think you should not have a problem. There was an earlier thread on this. http://sourceforge.net/mailarchive/message.php?msg_id=2507341 Regards, Somik __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |