I have a big HTML file that has 7 columns in preview.
I need content of 2 column but this HTML file is very big.
I want to use a HTML Parser that can get me content between 2 tag. for example can get content between <TD></TD>.
can this library do this or help me?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here is my experience:
- I try to use {htmlparser1_6 dir}\bin\filterbuilder.cmd for quick starting.
- I understand that htmlparser parses the html file to the tree of tags . For example
<tr> #1
<td> #2
aaa
</td> #2
<td> #3
bbb
</td> #3
</tr> #1
htmlparser will parse to
root
|
(<tr></tr>)#1
|
--(<td></td>)#2
|
--(<td></td>)#3
getting content between tags by traversing tree with Filter objects.
If you don't mind, please send your html so I will try.
Good luck!
Ha
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Even i am working on the same..how do I get the text between the tage <th>…and </th>? Which methods can be used?
I used the node.toString and node.getString..but they don't really give the desired value.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a big HTML file that has 7 columns in preview.
I need content of 2 column but this HTML file is very big.
I want to use a HTML Parser that can get me content between 2 tag. for example can get content between <TD></TD>.
can this library do this or help me?
Hi,
Here is my experience:
- I try to use {htmlparser1_6 dir}\bin\filterbuilder.cmd for quick starting.
- I understand that htmlparser parses the html file to the tree of tags . For example
<tr> #1
<td> #2
aaa
</td> #2
<td> #3
bbb
</td> #3
</tr> #1
htmlparser will parse to
root
|
(<tr></tr>)#1
|
--(<td></td>)#2
|
--(<td></td>)#3
getting content between tags by traversing tree with Filter objects.
If you don't mind, please send your html so I will try.
Good luck!
Ha
Even i am working on the same..how do I get the text between the tage <th>…and </th>? Which methods can be used?
I used the node.toString and node.getString..but they don't really give the desired value.
Thanks.