Re: [Htmlparser-user] parser help
Brought to you by:
derrickoswald
From: Derrick O. <der...@gm...> - 2011-08-18 18:40:42
|
Did you try the StringBean? Same code except: StringBean visitor = new StringBean (); parser.visitAllNodesWith(visitor); String textInPage = visitor.getStrings (); Or you can use some of the other facilities - like it will make it's own parser if you don't want to - as shown in the mainline: StringBean sb = new StringBean (); sb.setLinks (false); sb.setReplaceNonBreakingSpaces (true); sb.setCollapse (true); sb.setURL (args[0]); System.out.println (sb.getStrings ()); On Wed, Aug 17, 2011 at 10:25 PM, ernest cronin <ern...@gm...>wrote: > Hi, > > I have been trying to use the parser for some time and I have been unable > to get it to do exactly what I want, which is to gather only the plaintext > without javascript or style stuff. Here is the code I've been running: > > public class Test > { > public static void main (String[] args) > { > try > { > Parser parser = new Parser (args[0]); > TextExtractingVisitor visitor = new TextExtractingVisitor(); > parser.visitAllNodesWith(visitor); > String textInPage = visitor.getExtractedText(); > System.out.println(textInPage); > } > catch (ParserException pe) > { > pe.printStackTrace (); > } > } > } > > I could really use some help with this! > > Thanks, > Ernest > > > > ------------------------------------------------------------------------------ > Get a FREE DOWNLOAD! and learn more about uberSVN rich system, > user administration capabilities and model configuration. Take > the hassle out of deploying and managing Subversion and the > tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |