Re: [Htmlparser-user] Only extract text from div tag with specific attribute
Brought to you by:
derrickoswald
|
From: Joshua K. <jo...@in...> - 2008-04-01 23:38:51
|
You'll want to write your very own Visitor.
Something like this (I'm using an older version of htmlparser for this
example):
public class DivVisitor extends NodeVisitor {
public void visitTag(Tag tag) {
// see if the tag is a div tag here and then check its attibutes
// if it matches what you want, collect it into something that this
visitor can return via some getter method
}
}
Send your DivVisitor into the parser as you were doing with the
ObjectFIndingVisitor.
Hope that helps,
jk
On Tue, Apr 1, 2008 at 3:06 PM, Jumbo Pongo <jum...@gm...> wrote:
> Thanks for the reply, Joshua. I think that's what I'm trying to do. The
> part I'm stuck on is where to distinguish that I only want the div tag that
> has the attribute class="body". Here is my code:
>
> String contents = null;
>
> Parser parser = new Parser(url);
> ObjectFindingVisitor visitor = new ObjectFindingVisitor(Div.class);
> parser.visitAllNodesWith(visitor);
>
> Node[] nodes = visitor.getTags(); // do I really want to use getTags()
> here?
> for (int i = 0; i < nodes.length; i++)
> {
> // if nodes[i] has attribute class="body", then get the page text enclosed
> in the div tags
> // what to do here?
> }
>
> return contents;
>
>
> Obviously I am new to htmlparser, so much thanks in advance.
>
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
>
> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|