Re: [Htmlparser-user] StringExtractor
Brought to you by:
derrickoswald
From: David P. C. <dav...@gm...> - 2010-02-12 22:00:22
|
Great, thanks for the answer! (I just saw it now) The library seems great! One question, it seems that it does not handle the div elements correctly. A div element is a block element<http://www.webdesignfromscratch.com/html-css/css-block-and-inline.php> (by default), and thus it should render a new line. For example, with this html file: ++++++++++++++++++ <html> <body> test1 test2 <div>test3</div> test4 <span>test5</span> <span>test6</span> </body> </html> ++++++++++++++++++ if should produce: ++++++++++++++++++ test1 test2 test3 test4 test5 test6 ++++++++++++++++++ note the new line between test3 and test4. However, StringBean produces the following: ++++++++++++++++++ test1 test2 test3 test4 test5 test6 ++++++++++++++++++ It handles correctly the new lines for text and span nodes, but not for divs. Is that the intended effect? if so, is it possible to override this (add a new line for block elements)? Regards, David Portabella On Sat, Dec 12, 2009 at 10:02 AM, Derrick Oswald <der...@gm...>wrote: > This has been replaced by the main program in > org.htmlparser.beans.StringBean. > > Sorry for the misdirection > > On Wed, Dec 9, 2009 at 11:18 PM, David Portabella Clotet < > dav...@gm...> wrote: > >> Hello, >> >> In the website: http://htmlparser.sourceforge.net/samples.html >> there is info about the "StringExtractor" example: >> ++++++++++++++++++ >> String Extractor >> Extract text from a web page. >> org.htmlparser.parserapplications.StringExtractor >> bin/stringextractor http://website_url >> ++++++++++++++++++ >> >> However, I did not find this example in any of this two downloads: >> HTMLParser-2.0-SNAPSHOT-src.zip >> HTMLParser-2.0-SNAPSHOT-bin.zip >> >> Can you please tell me where to find the StringExtractor example? >> >> >> Best regards, >> DAvid Portabella >> >> |