Re: [Htmlparser-user] Only extract text from div tag with specific attribute
Brought to you by:
derrickoswald
From: Jumbo P. <jum...@gm...> - 2008-04-01 22:06:27
|
Thanks for the reply, Joshua. I think that's what I'm trying to do. The part I'm stuck on is where to distinguish that I only want the div tag that has the attribute class="body". Here is my code: String contents = null; Parser parser = new Parser(url); ObjectFindingVisitor visitor = new ObjectFindingVisitor(Div.class); parser.visitAllNodesWith(visitor); Node[] nodes = visitor.getTags(); // do I really want to use getTags() here? for (int i = 0; i < nodes.length; i++) { // if nodes[i] has attribute class="body", then get the page text enclosed in the div tags // what to do here? } return contents; Obviously I am new to htmlparser, so much thanks in advance. |