HTML Parser / Discussion / Help: Get Text and Link in Bold Tag

Masca - 2008-05-28

Hi,

I am new to using this tool. I needed to extract a Table from a webpage, and I did it using

Parser parser = new Parser (path);
NodeList list = parser.parse (new HasAttributeFilter ("table"));
String tableString = list.elementAt(1).toHtml();

As it is the second table on the page. Now I need to extract the links (and the corresponding text in the Table) that are in Bold. A snippet in the table is like:

<table cellspacing="4" cellpadding="4"><tr><td valign=top>
<a href="/Arts/">Arts</a> 

<a href="/Arts/Movies/">Movies</a>,
<a href="/Arts/Television/">Television</a>,
<a href="/Arts/Music/">Music</a>...


How can I extract the text Arts and the link /Arts/
I thank you all for any ideas?

O.O.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2008-05-29
 
 Rather than HasAttributeFilter you probably need a TagNameFilter("TABLE").
 
 Then the resulting NodeList of matching tags can be filtered again with extractAllNodesThatMatch (
 new AndFilter (new TagNameFilter ("A"), new StringFilter ("Arts", true)))
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Masca - 2008-05-29
 
 Dear Derrick,
 Thank you for the tip on the TagNameFilter - however I would like to extract the text and the links between the Bold Tags. “Arts” is just an example.
 So my question is once I got the Table how do I filter out the text and links within the Tags.
 Thanks again though,
 O.O.
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
 - Derrick Oswald - 2008-05-30
 
 Then you will want to make your own BoldTag that is composite...
 http://htmlparser.sourceforge.net/faq.html#composite
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Masca - 2008-05-31
 
 Thanks for your Post Derrick. I think I saw the FAQ – but I could not figure out how to get Tags from the list of Nodes. Anyway, I think I got my application to work using the Swing Parser. Thank you for your help.
 O.O.
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Get Text and Link in Bold Tag

Forums

Help

Get Text and Link in Bold Tag document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Get Text and Link in Bold Tag