Parsing file for Text

Brought to you by: derrickoswald

Parsing file for Text

Forum: Help

Created: 2005-11-28

Updated: 2013-04-27

Ray12 - 2005-11-28

I am trying to parse a simple html file to get the text value. This is the format of my file:

<p class=g>hello</p>
<p class=g>there</p>

I want to extract the values "hello" and "there". This is the code I am using:

Parser parser = new Parser ("sample.html");
NodeList list = new NodeList();
NodeFilter filter = new HasAttributeFilter("class","g");
list = parser.extractAllNodesThatMatch(filter);
for(int i=0; i<list.size();i++)
{
Node nl = list.elementAt(i);
System.out.println(nl.toHtml());
}

I just keep getting <p class=g> as the output. How do I get the text values between the nodes? Thanks for any help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2005-11-28
  
  In the latest Integration Release
  http://sourceforge.net/forum/forum.php?forum_id=510668
  the paragraph tag was added as a CompositeTag by default, so:
  System.out.println (nl.getChildren ().elementAt (0).toHtml());
  should work.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ray12 - 2005-11-30
  
  The code works as you suggested. Thanks for your help.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.