Re: [Htmlparser-user] parser help
Brought to you by:
derrickoswald
From: Ernest C. <ern...@gm...> - 2012-08-24 19:14:07
|
Hi, I use the parser a lot for work. one thing i've noticed is that in many news articles there are comment sections, and in these sections, plain text. but the parser doesn't pick them up. what is about the comment sections that make it unreadable? is there a different class i should be using? Thank you, ernest On Wed, Aug 17, 2011 at 4:25 PM, ernest cronin <ern...@gm...>wrote: > Hi, > > I have been trying to use the parser for some time and I have been unable > to get it to do exactly what I want, which is to gather only the plaintext > without javascript or style stuff. Here is the code I've been running: > > public class Test > { > public static void main (String[] args) > { > try > { > Parser parser = new Parser (args[0]); > TextExtractingVisitor visitor = new TextExtractingVisitor(); > parser.visitAllNodesWith(visitor); > String textInPage = visitor.getExtractedText(); > System.out.println(textInPage); > } > catch (ParserException pe) > { > pe.printStackTrace (); > } > } > } > > I could really use some help with this! > > Thanks, > Ernest > > |