In the latest Integration Release http://sourceforge.net/forum/forum.php?forum_id=510668
the paragraph tag was added as a CompositeTag by default, so:
System.out.println (nl.getChildren ().elementAt (0).toHtml());
should work.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to parse a simple html file to get the text value. This is the format of my file:
<p class=g>hello</p>
<p class=g>there</p>
I want to extract the values "hello" and "there". This is the code I am using:
Parser parser = new Parser ("sample.html");
NodeList list = new NodeList();
NodeFilter filter = new HasAttributeFilter("class","g");
list = parser.extractAllNodesThatMatch(filter);
for(int i=0; i<list.size();i++)
{
Node nl = list.elementAt(i);
System.out.println(nl.toHtml());
}
I just keep getting <p class=g> as the output. How do I get the text values between the nodes? Thanks for any help.
In the latest Integration Release
http://sourceforge.net/forum/forum.php?forum_id=510668
the paragraph tag was added as a CompositeTag by default, so:
System.out.println (nl.getChildren ().elementAt (0).toHtml());
should work.
The code works as you suggested. Thanks for your help.