I am new to using this tool. I needed to extract a Table from a webpage, and I did it using
Parser parser = new Parser (path);
NodeList list = parser.parse (new HasAttributeFilter ("table"));
String tableString = list.elementAt(1).toHtml();
As it is the second table on the page. Now I need to extract the links (and the corresponding text in the Table) that are in Bold. A snippet in the table is like:
Rather than HasAttributeFilter you probably need a TagNameFilter("TABLE").
Then the resulting NodeList of matching tags can be filtered again with extractAllNodesThatMatch (
new AndFilter (new TagNameFilter ("A"), new StringFilter ("Arts", true)))
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Derrick,
Thank you for the tip on the TagNameFilter - however I would like to extract the text and the links between the Bold Tags. “Arts” is just an example.
So my question is once I got the Table how do I filter out the text and links within the <b></b> Tags.
Thanks again though,
O.O.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your Post Derrick. I think I saw the FAQ – but I could not figure out how to get Tags from the list of Nodes. Anyway, I think I got my application to work using the Swing Parser. Thank you for your help.
O.O.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am new to using this tool. I needed to extract a Table from a webpage, and I did it using
Parser parser = new Parser (path);
NodeList list = parser.parse (new HasAttributeFilter ("table"));
String tableString = list.elementAt(1).toHtml();
As it is the second table on the page. Now I need to extract the links (and the corresponding text in the Table) that are in Bold. A snippet in the table is like:
<table cellspacing="4" cellpadding="4"><tr><td valign=top>
<b><a href="/Arts/">Arts</a></b><br>
<small>
<a href="/Arts/Movies/">Movies</a>,
<a href="/Arts/Television/">Television</a>,
<a href="/Arts/Music/">Music</a>...
</small>
How can I extract the text Arts and the link /Arts/
I thank you all for any ideas?
O.O.
Rather than HasAttributeFilter you probably need a TagNameFilter("TABLE").
Then the resulting NodeList of matching tags can be filtered again with extractAllNodesThatMatch (
new AndFilter (new TagNameFilter ("A"), new StringFilter ("Arts", true)))
Dear Derrick,
Thank you for the tip on the TagNameFilter - however I would like to extract the text and the links between the Bold Tags. “Arts” is just an example.
So my question is once I got the Table how do I filter out the text and links within the <b></b> Tags.
Thanks again though,
O.O.
Then you will want to make your own BoldTag that is composite...
http://htmlparser.sourceforge.net/faq.html#composite
Thanks for your Post Derrick. I think I saw the FAQ – but I could not figure out how to get Tags from the list of Nodes. Anyway, I think I got my application to work using the Swing Parser. Thank you for your help.
O.O.