I am newbie trying to learn html parser. I have simple html page as shown below
<html>
<body>
<table>
<tr>
<td>Sex<font></font></td>
<td>Name<font></font></td>
</tr>
<tr>
<td>Male<font></font></td>
<td>John Doe<font></font></td>
</tr>
<tr>
<td>Female<font></font></td>
<td>Jane Doe<font></font></td>
</tr>
</table>
</body>
</html>
I would like to extract the data from a table and print it to console as follows
Sex,Name
Male,John Doe
Female,Jane Doe
Any sample code to help me get started is appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
parser = new Parser();
Node nodes [] = parser.extractAllNodesThatAre(TableTag.class);
// Get the first table found (I might be off by one, here)
TableTag table = (TableTag)nodes[1];
TableRow []tableRows = table.getRows();
int numTableRows = tableRows.length;
Parser parser = new Parser();//whatever be the file you r parsing give to constructor
TagNameFilter filter = new TagNameFilter("TD");
NodeList list =parser.extractAllNodesThatMatch(filter);
for(int i =0;i < list.size();i++){
System.out.print (list.elementAt(i).toPlainTextString()+",");
i++;
System.out.println (list.elementAt(i).toPlainTextString());
}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks to both of you for the sample code.
Lori, your code did not work even after slight changes. The first table is TableTag table = (TableTag)nodes[0]; But then I couldnt figure out what the position variable is. Made x to start from 1 and got null pointer exception.
Sidhu, you code worked great.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am newbie trying to learn html parser. I have simple html page as shown below
<html>
<body>
<table>
<tr>
<td>Sex<font></font></td>
<td>Name<font></font></td>
</tr>
<tr>
<td>Male<font></font></td>
<td>John Doe<font></font></td>
</tr>
<tr>
<td>Female<font></font></td>
<td>Jane Doe<font></font></td>
</tr>
</table>
</body>
</html>
I would like to extract the data from a table and print it to console as follows
Sex,Name
Male,John Doe
Female,Jane Doe
Any sample code to help me get started is appreciated.
Relevant snippet, below. Hope it helps.
-Lori
-----------------------------
parser = new Parser();
Node nodes [] = parser.extractAllNodesThatAre(TableTag.class);
// Get the first table found (I might be off by one, here)
TableTag table = (TableTag)nodes[1];
TableRow []tableRows = table.getRows();
int numTableRows = tableRows.length;
for(int x=position+1;x<numTableRows+1;x++){
myValue1 = table.childAt(x).getChildren().elementAt(1);
myValue 2 = table.childAt(x).getChildren().elementAt(2);
}
Parser parser = new Parser();//whatever be the file you r parsing give to constructor
TagNameFilter filter = new TagNameFilter("TD");
NodeList list =parser.extractAllNodesThatMatch(filter);
for(int i =0;i < list.size();i++){
System.out.print (list.elementAt(i).toPlainTextString()+",");
i++;
System.out.println (list.elementAt(i).toPlainTextString());
}
Thanks to both of you for the sample code.
Lori, your code did not work even after slight changes. The first table is TableTag table = (TableTag)nodes[0]; But then I couldnt figure out what the position variable is. Made x to start from 1 and got null pointer exception.
Sidhu, you code worked great.