Can't get Links

Brought to you by: derrickoswald

Can't get Links

Forum: Help

Created: 2009-02-22

Updated: 2013-04-27

fancy - 2009-02-22

http://www.cs.princeton.edu/~blei/
I tried to parser this page and get all the links but failed.
method 1:
NodeFilter filter=new NodeClassFilter(LinkTag.class);
method 2:
NodeFilter filter=new TagNameFilter("a");

what's wrong?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2009-02-22
  
  How did you apply the filter then?
  
  filter=new TagNameFilter("a");
  parser = new HtmlParser ("http://www.cs.princeton.edu/~blei/");
  nodes = parser.Parse (filter);
  
  ... process nodes
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- fancy - 2009-02-22
  
  thank you. I got it
  I use :
              NodeList nodes = parser.parse(filter);
              NodeIterator iter = nodes.elements();
              Tag tag;
              while(iter.hasMoreNodes()){
                  tag=(Tag) iter.nextNode();
                  url=tag.getAttribute("href"));
                  if(url.endsWiths(".pdf")){
                     //
                  }
              }
  
  I ignored that a may be <a name=""> and url may be null and there hits some exception but I just try-catch it and don't print it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.