Re: [Htmlparser-user] How to extract untagged text

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

The parser would make a number of TextNodes out of that separated by TagNodes with BR names.
You'll need to handle this sort of partial extraction of mixed text and HTML yourself, possibly by just defining and registering a BrTag that prints <BR> even for the toText() method, then toText() the whole section.

----- Original Message ----
From: Nagahiro Daiki <e27...@gm...>
To: htm...@li...
Sent: Tuesday, April 29, 2008 11:00:13 PM
Subject: [Htmlparser-user] How to extract untagged text

Hello. I'm new to HTML Parser.

For example,

-----
<html>
<body>
    <object id="aaa">
        ...
    </object>
    Lorem ipsum dolor sit amet,<br>
    consectetur adipisicing elit,<br>
    sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
    <object id="xxx">
        ...
    </object>
</body>
</html>
-----

My question:
How to extract

-----
Lorem ipsum dolor sit amet,<br>
consectetur adipisicing elit,<br>
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
-----

?

I tried toHtml method of TextNode, but it seems to ignore the <br> tag.

Thanks for help!

Anonymous Otaku

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user