Re: [Htmlparser-user] Only extract text from div tag with specific attribute
Brought to you by:
derrickoswald
From: Joshua K. <jo...@in...> - 2008-04-01 23:38:51
|
You'll want to write your very own Visitor. Something like this (I'm using an older version of htmlparser for this example): public class DivVisitor extends NodeVisitor { public void visitTag(Tag tag) { // see if the tag is a div tag here and then check its attibutes // if it matches what you want, collect it into something that this visitor can return via some getter method } } Send your DivVisitor into the parser as you were doing with the ObjectFIndingVisitor. Hope that helps, jk On Tue, Apr 1, 2008 at 3:06 PM, Jumbo Pongo <jum...@gm...> wrote: > Thanks for the reply, Joshua. I think that's what I'm trying to do. The > part I'm stuck on is where to distinguish that I only want the div tag that > has the attribute class="body". Here is my code: > > String contents = null; > > Parser parser = new Parser(url); > ObjectFindingVisitor visitor = new ObjectFindingVisitor(Div.class); > parser.visitAllNodesWith(visitor); > > Node[] nodes = visitor.getTags(); // do I really want to use getTags() > here? > for (int i = 0; i < nodes.length; i++) > { > // if nodes[i] has attribute class="body", then get the page text enclosed > in the div tags > // what to do here? > } > > return contents; > > > Obviously I am new to htmlparser, so much thanks in advance. > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |