Re: [Htmlparser-user] How to extract untagged text
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2008-04-30 13:53:28
|
The parser would make a number of TextNodes out of that separated by TagNodes with BR names. You'll need to handle this sort of partial extraction of mixed text and HTML yourself, possibly by just defining and registering a BrTag that prints <BR> even for the toText() method, then toText() the whole section. ----- Original Message ---- From: Nagahiro Daiki <e27...@gm...> To: htm...@li... Sent: Tuesday, April 29, 2008 11:00:13 PM Subject: [Htmlparser-user] How to extract untagged text Hello. I'm new to HTML Parser. For example, ----- <html> <body> <object id="aaa"> ... </object> Lorem ipsum dolor sit amet,<br> consectetur adipisicing elit,<br> sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <object id="xxx"> ... </object> </body> </html> ----- My question: How to extract ----- Lorem ipsum dolor sit amet,<br> consectetur adipisicing elit,<br> sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. ----- ? I tried toHtml method of TextNode, but it seems to ignore the <br> tag. Thanks for help! Anonymous Otaku ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |