Great, thanks for the answer! (I just saw it now)
The library seems great!
One question, it seems that it does not handle the div elements correctly.
A div element is a block
element<http://www.webdesignfromscratch.com/html-css/css-block-and-inline.php>
(by
default), and thus it should render a new line.
For example, with this html file:
++++++++++++++++++
<html>
<body>
test1
test2
<div>test3</div>
test4
<span>test5</span>
<span>test6</span>
</body>
</html>
++++++++++++++++++
if should produce:
++++++++++++++++++
test1 test2
test3
test4 test5 test6
++++++++++++++++++
note the new line between test3 and test4.
However, StringBean produces the following:
++++++++++++++++++
test1 test2
test3 test4 test5 test6
++++++++++++++++++
It handles correctly the new lines for text and span nodes, but not for
divs.
Is that the intended effect? if so, is it possible to override this (add a
new line for block elements)?
Regards,
David Portabella
On Sat, Dec 12, 2009 at 10:02 AM, Derrick Oswald
<der...@gm...>wrote:
> This has been replaced by the main program in
> org.htmlparser.beans.StringBean.
>
> Sorry for the misdirection
>
> On Wed, Dec 9, 2009 at 11:18 PM, David Portabella Clotet <
> dav...@gm...> wrote:
>
>> Hello,
>>
>> In the website: http://htmlparser.sourceforge.net/samples.html
>> there is info about the "StringExtractor" example:
>> ++++++++++++++++++
>> String Extractor
>> Extract text from a web page.
>> org.htmlparser.parserapplications.StringExtractor
>> bin/stringextractor http://website_url
>> ++++++++++++++++++
>>
>> However, I did not find this example in any of this two downloads:
>> HTMLParser-2.0-SNAPSHOT-src.zip
>> HTMLParser-2.0-SNAPSHOT-bin.zip
>>
>> Can you please tell me where to find the StringExtractor example?
>>
>>
>> Best regards,
>> DAvid Portabella
>>
>>
|