Re: [Htmlparser-user] Only extract text from div tag with specific attribute
Brought to you by:
derrickoswald
From: Joshua K. <jo...@in...> - 2008-04-01 21:39:10
|
You could write your own NodeVisitor for this. --jk On Tue, Apr 1, 2008 at 11:54 AM, Jumbo Pongo <jum...@gm...> wrote: > Hello, > > I'm trying to extract only the page text inside div tags with the > attribute class="body". Inside the div-body tags are other tags, e.g. h1, > h2, p, etc., which themselves should be ignored but their enclosed text > should be included with the rest of the body text. > > I'm using extractAllNodesThatMatch but I don't see where I can limit it > only to the div tag with the attribute class="body". > > Can anyone figure this out? > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |