Re: [Htmlparser-user] Only extract text from div tag with specific attribute
Brought to you by:
derrickoswald
From: Jumbo P. <jum...@gm...> - 2008-04-02 19:31:49
|
I figured it out. It's actually pretty simple. Here is the code. Thanks anyway. Parser p = new Parser(url); NodeList list = p.extractAllNodesThatMatch (new AndFilter (new TagNameFilter ("div"), new HasAttributeFilter("class", "body"))); StringBean sb = new StringBean(); list.visitAllNodesWith(sb); System.out.println(sb.getStrings()); On Tue, Apr 1, 2008 at 7:38 PM, Joshua Kerievsky <jo...@in...> wrote: > You'll want to write your very own Visitor. > > Something like this (I'm using an older version of htmlparser for this > example): > > public class DivVisitor extends NodeVisitor { > > public void visitTag(Tag tag) { > // see if the tag is a div tag here and then check its attibutes > // if it matches what you want, collect it into something that this > visitor can return via some getter method > } > } > > Send your DivVisitor into the parser as you were doing with the > ObjectFIndingVisitor. > > Hope that helps, > jk > > On Tue, Apr 1, 2008 at 3:06 PM, Jumbo Pongo <jum...@gm...> wrote: > > > Thanks for the reply, Joshua. I think that's what I'm trying to do. > > The part I'm stuck on is where to distinguish that I only want the div tag > > that has the attribute class="body". Here is my code: > > > > String contents = null; > > > > Parser parser = new Parser(url); > > ObjectFindingVisitor visitor = new ObjectFindingVisitor(Div.class); > > parser.visitAllNodesWith(visitor); > > > > Node[] nodes = visitor.getTags(); // do I really want to use getTags() > > here? > > for (int i = 0; i < nodes.length; i++) > > { > > // if nodes[i] has attribute class="body", then get the page text > > enclosed in the div tags > > // what to do here? > > } > > > > return contents; > > > > > > Obviously I am new to htmlparser, so much thanks in advance. > > > > > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |