Re: [Htmlparser-user] Only extract text from div tag with specific attribute
Brought to you by:
derrickoswald
|
From: Jumbo P. <jum...@gm...> - 2008-04-02 19:31:49
|
I figured it out. It's actually pretty simple. Here is the code. Thanks
anyway.
Parser p = new Parser(url);
NodeList list = p.extractAllNodesThatMatch (new AndFilter (new TagNameFilter
("div"), new HasAttributeFilter("class", "body")));
StringBean sb = new StringBean();
list.visitAllNodesWith(sb);
System.out.println(sb.getStrings());
On Tue, Apr 1, 2008 at 7:38 PM, Joshua Kerievsky <jo...@in...>
wrote:
> You'll want to write your very own Visitor.
>
> Something like this (I'm using an older version of htmlparser for this
> example):
>
> public class DivVisitor extends NodeVisitor {
>
> public void visitTag(Tag tag) {
> // see if the tag is a div tag here and then check its attibutes
> // if it matches what you want, collect it into something that this
> visitor can return via some getter method
> }
> }
>
> Send your DivVisitor into the parser as you were doing with the
> ObjectFIndingVisitor.
>
> Hope that helps,
> jk
>
> On Tue, Apr 1, 2008 at 3:06 PM, Jumbo Pongo <jum...@gm...> wrote:
>
> > Thanks for the reply, Joshua. I think that's what I'm trying to do.
> > The part I'm stuck on is where to distinguish that I only want the div tag
> > that has the attribute class="body". Here is my code:
> >
> > String contents = null;
> >
> > Parser parser = new Parser(url);
> > ObjectFindingVisitor visitor = new ObjectFindingVisitor(Div.class);
> > parser.visitAllNodesWith(visitor);
> >
> > Node[] nodes = visitor.getTags(); // do I really want to use getTags()
> > here?
> > for (int i = 0; i < nodes.length; i++)
> > {
> > // if nodes[i] has attribute class="body", then get the page text
> > enclosed in the div tags
> > // what to do here?
> > }
> >
> > return contents;
> >
> >
> > Obviously I am new to htmlparser, so much thanks in advance.
> >
> >
> >
> > -------------------------------------------------------------------------
> > Check out the new SourceForge.net Marketplace.
> > It's the best place to buy or sell services for
> > just about anything Open Source.
> >
> > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
> >
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
>
> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|