[Htmlparser-user] Only extract text from div tag with specific attribute
Brought to you by:
derrickoswald
From: Jumbo P. <jum...@gm...> - 2008-04-01 18:54:09
|
Hello, I'm trying to extract only the page text inside div tags with the attribute class="body". Inside the div-body tags are other tags, e.g. h1, h2, p, etc., which themselves should be ignored but their enclosed text should be included with the rest of the body text. I'm using extractAllNodesThatMatch but I don't see where I can limit it only to the div tag with the attribute class="body". Can anyone figure this out? |