http://www.cs.princeton.edu/~blei/
I tried to parser this page and get all the links but failed.
method 1:
NodeFilter filter=new NodeClassFilter(LinkTag.class);
method 2:
NodeFilter filter=new TagNameFilter("a");
what's wrong?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thank you. I got it
I use :
NodeList nodes = parser.parse(filter);
NodeIterator iter = nodes.elements();
Tag tag;
while(iter.hasMoreNodes()){
tag=(Tag) iter.nextNode();
url=tag.getAttribute("href"));
if(url.endsWiths(".pdf")){
//
}
}
I ignored that a may be <a name=""> and url may be null and there hits some exception but I just try-catch it and don't print it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
http://www.cs.princeton.edu/~blei/
I tried to parser this page and get all the links but failed.
method 1:
NodeFilter filter=new NodeClassFilter(LinkTag.class);
method 2:
NodeFilter filter=new TagNameFilter("a");
what's wrong?
How did you apply the filter then?
filter=new TagNameFilter("a");
parser = new HtmlParser ("http://www.cs.princeton.edu/~blei/");
nodes = parser.Parse (filter);
... process nodes
thank you. I got it
I use :
NodeList nodes = parser.parse(filter);
NodeIterator iter = nodes.elements();
Tag tag;
while(iter.hasMoreNodes()){
tag=(Tag) iter.nextNode();
url=tag.getAttribute("href"));
if(url.endsWiths(".pdf")){
//
}
}
I ignored that a may be <a name=""> and url may be null and there hits some exception but I just try-catch it and don't print it.