Menu

Filters sometimes misses tags

Help
oew
2006-10-04
2013-04-27
  • oew

    oew - 2006-10-04

    Hi All,

    I use the parser to retrieve infos from sites (like lots of us here) and I sometimes have strange results:

    for example :
    NodeList list = parser.extractAllNodesThatMatch( new TagNameFilter("a"));
    is not able to "find"
    <A onClick="someScript(123456789); return true;" HREF="mailto:xxxx@xxxx.com?subject=bla bla">xxxx@xxxx.com</A>
    I get the same behaviour when I use the LinkTag as NodeFilter class
    <A HREF="mailto:xxxx@xxxx.com?subject=bla bla is perfectly detected as being a link in both cases.
    It also happens when I try to extract <select> tag in some other pages, some are found, some are not ?

    Did someone already have this ? if yes how did you solve it ?
    Or does somenoe has a rational explanation to this ?

    thx

     
    • Derrick Oswald

      Derrick Oswald - 2006-10-10

      I'm not sure, but the script or comments may contain angle brackets that obliviate following tags.
      Try setting
         Lexer.STRIBT_REMARKS = false;
      and
         ScriptScanner.STRICT = false;
      to loosen up the parse a bit and see if that solves it.

      Otherwise a small (or large) test case that shows the failure would be good.

       
    • oew

      oew - 2006-10-11

      Hi Derrick,

      here is the test case:
      on http://www.monster.de/ i am not able to get this select :
      <select id="what" name="fn" onchange="javascript:getOptionTitle(this)">
      If I try same code on http://francais.monster.be/
      <select id="what" name="fn" > it works perfectly

      here is the filter :
          public NodeFilter buildSelectNodeFilter() {
          NodeFilter filter;
          filter = new NodeClassFilter(SelectTag.class);
          filter = new AndFilter(filter, new NodeFilter() {
              public boolean accept(Node node) {
              return ("what".equals((((SelectTag) node).getAttribute("id"))));
              }
          });
          return filter;
          }

      Strange isn't it? anyway i'll try your solution proposal

      Thx

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.