Re: [Htmlparser-user] parser help
Brought to you by:
derrickoswald
|
From: Derrick O. <der...@gm...> - 2011-08-18 18:40:42
|
Did you try the StringBean?
Same code except:
StringBean visitor = new StringBean ();
parser.visitAllNodesWith(visitor);
String textInPage = visitor.getStrings ();
Or you can use some of the other facilities - like it will make it's own
parser if you don't want to - as shown in the mainline:
StringBean sb = new StringBean ();
sb.setLinks (false);
sb.setReplaceNonBreakingSpaces (true);
sb.setCollapse (true);
sb.setURL (args[0]);
System.out.println (sb.getStrings ());
On Wed, Aug 17, 2011 at 10:25 PM, ernest cronin <ern...@gm...>wrote:
> Hi,
>
> I have been trying to use the parser for some time and I have been unable
> to get it to do exactly what I want, which is to gather only the plaintext
> without javascript or style stuff. Here is the code I've been running:
>
> public class Test
> {
> public static void main (String[] args)
> {
> try
> {
> Parser parser = new Parser (args[0]);
> TextExtractingVisitor visitor = new TextExtractingVisitor();
> parser.visitAllNodesWith(visitor);
> String textInPage = visitor.getExtractedText();
> System.out.println(textInPage);
> }
> catch (ParserException pe)
> {
> pe.printStackTrace ();
> }
> }
> }
>
> I could really use some help with this!
>
> Thanks,
> Ernest
>
>
>
> ------------------------------------------------------------------------------
> Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> user administration capabilities and model configuration. Take
> the hassle out of deploying and managing Subversion and the
> tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|