I am using linktag to extract data from link elements.I do alright when I get linkText and link information . But in cases where i have a <li> tag enclosing my linkText i cannot extract the link. How would i get the link text when i have li tags?
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your reply. I couldn't find any NodeNameFilter class. Anyway here is the code I am using to extract links. Using this code the links which have <li>tags don't show any text. How could i modify this?
Parser lt = new Parser();
lt.setInputHTML(answer);
NodeFilter linkFilter = new NodeClassFilter (LinkTag.class);
NodeList links = new NodeList ();
for (NodeIterator e = nt.elements (); e.hasMoreNodes (); )
e.nextNode ().collectInto (links, linkFilter);
for (int i = 0; n=links.size (); i<n;i++)
{
LinkTag linkTag = (LinkTag)links.elementAt (i);
if(linkTag.isHTTPLikeLink()){
System.out.println ("link text "+linkTag.getLinkText());
System.out.println (" link tag "+linkTag.getLink ());
}
}
thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, it's TagNameFilter, in the filters package.
I can't see why you aren't getting the text. The getLinkText() method is just calling asString() on the getChildren() NodeList, which uses toPlainTextString(). If you can't debug it, perhaps just use:
linkTag.getChildren ().toHtml ();
If that shows the "<li>text</li>" you expect, then you should log a bug.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
THis is the html code i am reading from.Maybe you could help.
<td> <a href=/adfad/status.asp#75><b>All services are currently running normally.</a></b><br>
<a href=/fsdfs/status.asp#72><li> all day </a></li><br>
<a href=/sfsfsf/status.asp#79><li> link unavailable Sat 20 Oct 07:00-12:00 GMT</a></li><br>
<a href=/fsldfsdfas/status.asp#80><li>servers unavailable Tue Oct 25 05:00 GMT</a></li><br> <div align=right>
<a href="mailto:helpdesk@dfss.com?subject=Page feedback">» Report
a problem</a>
| <a href="/fasdlfasdf/status.asp">» See status
page</a></div></td>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello ,
I am using linktag to extract data from link elements.I do alright when I get linkText and link information . But in cases where i have a <li> tag enclosing my linkText i cannot extract the link. How would i get the link text when i have li tags?
Thanks
It seems you aren't using a filter, because it would then come for free:
NodeList links = parser.extractAllNodesThatMatch (
new NodeNameFilter ("A"));
This will put all the <A>xxx</A> tags in the page into the links list.
Hello Derrick,
Thanks for your reply. I couldn't find any NodeNameFilter class. Anyway here is the code I am using to extract links. Using this code the links which have <li>tags don't show any text. How could i modify this?
Parser lt = new Parser();
lt.setInputHTML(answer);
NodeFilter linkFilter = new NodeClassFilter (LinkTag.class);
NodeList links = new NodeList ();
for (NodeIterator e = nt.elements (); e.hasMoreNodes (); )
e.nextNode ().collectInto (links, linkFilter);
for (int i = 0; n=links.size (); i<n;i++)
{
LinkTag linkTag = (LinkTag)links.elementAt (i);
if(linkTag.isHTTPLikeLink()){
System.out.println ("link text "+linkTag.getLinkText());
System.out.println (" link tag "+linkTag.getLink ());
}
}
thank you
Sorry, it's TagNameFilter, in the filters package.
I can't see why you aren't getting the text. The getLinkText() method is just calling asString() on the getChildren() NodeList, which uses toPlainTextString(). If you can't debug it, perhaps just use:
linkTag.getChildren ().toHtml ();
If that shows the "<li>text</li>" you expect, then you should log a bug.
Hi ,
THis is the html code i am reading from.Maybe you could help.
<td> <a href=/adfad/status.asp#75><b>All services are currently running normally.</a></b><br>
<a href=/fsdfs/status.asp#72><li> all day </a></li><br>
<a href=/sfsfsf/status.asp#79><li> link unavailable Sat 20 Oct 07:00-12:00 GMT</a></li><br>
<a href=/fsldfsdfas/status.asp#80><li>servers unavailable Tue Oct 25 05:00 GMT</a></li><br> <div align=right>
<a href="mailto:helpdesk@dfss.com?subject=Page feedback">» Report
a problem</a>
| <a href="/fasdlfasdf/status.asp">» See status
page</a></div></td>