Menu

Parsing file for Text

Help
Ray12
2005-11-28
2013-04-27
  • Ray12

    Ray12 - 2005-11-28

    I am trying to parse a simple html file to get the text value. This is the format of my file:

    <p class=g>hello</p>
    <p class=g>there</p>

    I want to extract the values "hello" and "there". This is the code I am using:

    Parser parser = new Parser ("sample.html");
    NodeList list = new NodeList();
    NodeFilter filter = new HasAttributeFilter("class","g");
    list = parser.extractAllNodesThatMatch(filter);
    for(int i=0; i<list.size();i++)
    {
    Node nl = list.elementAt(i);
    System.out.println(nl.toHtml());
    }

    I just keep getting <p class=g> as the output. How do I get the text values between the nodes? Thanks for any help.

     
    • Derrick Oswald

      Derrick Oswald - 2005-11-28

      In the latest Integration Release
        http://sourceforge.net/forum/forum.php?forum_id=510668
      the paragraph tag was added as a CompositeTag by default, so:
        System.out.println (nl.getChildren ().elementAt (0).toHtml());
      should work.

       
    • Ray12

      Ray12 - 2005-11-30

      The code works as you suggested. Thanks for your help.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.