RE: [Htmlparser-user] Efficient parsing - help needed
Brought to you by:
derrickoswald
From: <dha...@or...> - 2003-01-06 11:49:23
|
Ash, For your requirement of reading the entire HTML and storing it on disk in an identical format, I suggest that you not use the HTMLParser. I suggest that u do it onyour own using readers and writers for the present. The changes suggested by you are quite good. However as far as the toHTML() method is concerned it does not exactly throw replicate the input HTML. So if you are using it to do that you are better off with the approach given above. However for parsing HTML, this parser is great not only because it works beautifully, but because it is so easy to use as described by Somik below but also because you can switch off and switch on the parsers as required. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 -----Original Message----- From: jtrek4 [mailto:jt...@ya...] Sent: Monday, January 06, 2003 5:07 PM To: htmlparser-user Cc: jtrek4 Subject: Re: [Htmlparser-user] Efficient parsing - help needed Hi Somik, Thanks for the help. > You can use toHTML() to do this.. > HTMLNode node; > for (HTMLEnumeration e = > parser.elements();e.hasMoreNodes();) { > node = e.nextHTMLNode(); > writeToDisk(node.toHTML()); > } I tried this, but toHTML() modifies the contents, wrongly in some cases. I have posted a bug regarding this : http://sourceforge.net/tracker/index.php?func=detail&aid=663038&group_id =24399&atid=381399 I have one suggestion to make : overloaded constructors in HTMLParser of the foll. signature/s : public HTMLParser(java.lang.String resourceLocn, HTMLParserFeedback feedback, Writer writer) public HTMLParser(java.lang.String resourceLocn, Writer writer) with corresponding overloaded constructors in HTMLReader: public HTMLReader(java.io.Reader in, int len, Writer writer) public HTMLReader(java.io.Reader in, java.lang.String url, Writer writer) This will give the users a way to save the response to disk as it is received. Of course, there is another option of taking a String file name argument, but the user may want to specify the file encoding as well (as is the case with me). So the java.io.Writer is a better option. This should not take much time to implement, as you just need to check if the writer has been supplied and once you read a line using the readLine() method in HTMLReader, write this string to the writer using the println method and call flush(). This gives the added advantage to the user of preserving line breaks at the original points. What do you think? Also, when can we expect the next release? Warm Regards, Ash ________________________________________________________________________ Missed your favourite TV serial last night? Try the new, Yahoo! TV. visit http://in.tv.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |