Menu

Parse Columns help needed

Help
Anonymous
2004-09-12
2004-09-13
  • Anonymous

    Anonymous - 2004-09-12

    I am new to both Java and htmlparser. I need help on how to parse an HTML page so that I can extract text from SPECIFIC columns within a table. I need something that will get an html as below:

    <tr>
        <td><b>First Column:&nbsp;</b>
        <ul>
            <li><a href="...">First</a></li>
            <li><a href="...">Second</a></li>
        </ul>&nbsp;
        </td>
         
        <td><b>Second Column:&nbsp;</b>
            <ul>
            <li><a href="...">extra1</a></li>
        </ul>&nbsp;
        </td>

        <td><b>Third Column:&nbsp;</b>
            <ul>
            <li><a href="...">Extra2</a></li>
        </ul>&nbsp;
        </td>
    </tr>

    ...and produce only text from the lists under the column with text "First Column" and "Third Column" in the TD tags:

    First
    Second

    Extra2

    How can this be achieved?

    Thank you.

     
    • Derrick Oswald

      Derrick Oswald - 2004-09-13

      You can get all tables using a filter:
          NodeList list = parser.extractAllNodesThatMatch (new TagNameFilter ("TABLE"));
          for (int i = 0; i < list.size (); i++)
              TableTag table = (TableTag)list.elementAt (i);

      Once you have the table tag, you can get at the data by rows:
          TableRow[] rows = table.getRows ();
          for (int i = 0; i < rows.length; i++)
          {
               TableColumn[] columns = rows[i].getColumns ();
               for (int j = 0; j < rows.length; j++)
                   System.out.println (columns[j].toPlainTextString ());
          }

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.