Re: [Htmlparser-user] parser help

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Did you try the StringBean?
Same code except:
   StringBean visitor = new StringBean ();
    parser.visitAllNodesWith(visitor);
      String textInPage = visitor.getStrings ();

Or you can use some of the other facilities - like it will make it's own
parser if you don't want to - as shown in the mainline:
            StringBean sb = new StringBean ();
            sb.setLinks (false);
            sb.setReplaceNonBreakingSpaces (true);
            sb.setCollapse (true);
            sb.setURL (args[0]);
            System.out.println (sb.getStrings ());


On Wed, Aug 17, 2011 at 10:25 PM, ernest cronin <ern...@gm...>wrote:

> Hi,
>
> I have been trying to use the parser for some time and I have been unable
> to get it to do exactly what I want, which is to gather only the plaintext
> without javascript or style stuff. Here is the code I've been running:
>
>   public class Test
>    {
>       public static void main (String[] args)
>       {
>          try
>          {
>             Parser parser = new Parser (args[0]);
>      TextExtractingVisitor visitor = new TextExtractingVisitor();
>     parser.visitAllNodesWith(visitor);
>       String textInPage = visitor.getExtractedText();
>    System.out.println(textInPage);
>          }
>             catch (ParserException pe)
>             {
>                pe.printStackTrace ();
>             }
>       }
>     }
>
> I could really use some help with this!
>
> Thanks,
> Ernest
>
>
>
> ------------------------------------------------------------------------------
> Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> user administration capabilities and model configuration. Take
> the hassle out of deploying and managing Subversion and the
> tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>