Re: [Htmlparser-user] Only extract text from div tag with specific attribute

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

You'll want to write your very own Visitor.

Something like this (I'm using an older version of htmlparser for this
example):

public class DivVisitor extends NodeVisitor {

    public void visitTag(Tag tag) {
       // see if the tag is a div tag here and then check its attibutes
      // if it matches what you want, collect it into something that this
visitor can return via some getter method
    }
}

Send your DivVisitor into the parser as you were doing with the
ObjectFIndingVisitor.

Hope that helps,
jk

On Tue, Apr 1, 2008 at 3:06 PM, Jumbo Pongo <jum...@gm...> wrote:

> Thanks for the reply, Joshua.  I think that's what I'm trying to do.  The
> part I'm stuck on is where to distinguish that I only want the div tag that
> has the attribute class="body".  Here is my code:
>
> String contents = null;
>
> Parser parser = new Parser(url);
> ObjectFindingVisitor visitor = new ObjectFindingVisitor(Div.class);
> parser.visitAllNodesWith(visitor);
>
> Node[] nodes = visitor.getTags(); // do I really want to use getTags()
> here?
> for (int i = 0; i < nodes.length; i++)
> {
> // if nodes[i] has attribute class="body", then get the page text enclosed
> in the div tags
> // what to do here?
> }
>
> return contents;
>
>
> Obviously I am new to htmlparser, so much thanks in advance.
>
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
>
> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>