Parse Columns help needed

Brought to you by: derrickoswald

Parse Columns help needed

Forum: Help

Creator: Anonymous

Created: 2004-09-12

Updated: 2004-09-13

Anonymous - 2004-09-12

I am new to both Java and htmlparser. I need help on how to parse an HTML page so that I can extract text from SPECIFIC columns within a table. I need something that will get an html as below:

<tr>
    <td><b>First Column: </b>
    <ul>
        <li><a href="...">First</a></li>
        <li><a href="...">Second</a></li>
    </ul> 
    </td>

    <td><b>Second Column: </b>
        <ul>
        <li><a href="...">extra1</a></li>
    </ul> 
    </td>

    <td><b>Third Column: </b>
        <ul>
        <li><a href="...">Extra2</a></li>
    </ul> 
    </td>
</tr>

...and produce only text from the lists under the column with text "First Column" and "Third Column" in the TD tags:

First
Second

Extra2

How can this be achieved?

Thank you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2004-09-13
  
  You can get all tables using a filter:
      NodeList list = parser.extractAllNodesThatMatch (new TagNameFilter ("TABLE"));
      for (int i = 0; i < list.size (); i++)
          TableTag table = (TableTag)list.elementAt (i);
  
  Once you have the table tag, you can get at the data by rows:
      TableRow[] rows = table.getRows ();
      for (int i = 0; i < rows.length; i++)
      {
           TableColumn[] columns = rows[i].getColumns ();
           for (int j = 0; j < rows.length; j++)
               System.out.println (columns[j].toPlainTextString ());
      }
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.