Menu

Can't get Links

Help
fancy
2009-02-22
2013-04-27
  • fancy

    fancy - 2009-02-22

    http://www.cs.princeton.edu/~blei/
    I tried to parser this page and get all the links but failed.
    method 1:
    NodeFilter filter=new NodeClassFilter(LinkTag.class);
    method 2:
    NodeFilter filter=new TagNameFilter("a");

    what's wrong?

     
    • Derrick Oswald

      Derrick Oswald - 2009-02-22

      How did you apply the filter then?

      filter=new TagNameFilter("a");
      parser = new HtmlParser ("http://www.cs.princeton.edu/~blei/");
      nodes = parser.Parse (filter);

      ... process nodes

       
    • fancy

      fancy - 2009-02-22

      thank you. I got it
      I use :
                  NodeList nodes = parser.parse(filter);
                  NodeIterator iter = nodes.elements();
                  Tag tag;
                  while(iter.hasMoreNodes()){
                      tag=(Tag) iter.nextNode();
                      url=tag.getAttribute("href"));
                      if(url.endsWiths(".pdf")){
                         //
                      }            
                  }

      I ignored that a may be <a name=""> and url may be null and there hits some exception but I just try-catch it and don't print it.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.