Menu

Needs to remove the Option Tags.

Help
2006-09-11
2013-04-27
  • Suresh Setty

    Suresh Setty - 2006-09-11

    Hi All,
    I'm a new user to the HTMLParser.
    In my application I needs to remove the Script Tag and its content as well as Select Tag and its Child Tag OPTION.
    And then I've to parse the page.
    For that I'm making use of
    ScriptScanner.STRICT=false;
    to remove the Script content.
    But I'm not aware of how to remove the Select as well as Option tags and their tags.

    Can anyone helps me...

    Thanks & REgards,
    Suresh N.

     
    • Derrick Oswald

      Derrick Oswald - 2006-09-11

      You can probably just omit the output of a tag with a custom (redefined) tag, like so:

      class MySelectTag extends SelectTag
      {   // override toHtml to return nothing
          String toHtml (boolean verbatim)
          { return (""); }
      }

      PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
      factory.registerTag (new MySelectTag ());
      parser.setNodeFactory (factory);

      Then when you convert the page back to HTML, the select tags should be eliminated:

      NodeList all_nodes = parser.parse (null);
      System.out.println (all_nodes.toHtml ());

       
    • Suresh Setty

      Suresh Setty - 2006-09-12

      Thanks Mr.Derrick.
      This is working as if I've done, as you  specified. But for extracting the data from a page, I'm using the StringExtractor class. Then it is not removing the content of the SELECT and the OPTION tags.

      If needed, i'll submit the code snippet also.

      Thanks & Regards,
      Suresh N.

       
    • Suresh Setty

      Suresh Setty - 2006-09-21

      Thank you Derrick.
      Now I've removed the SELECT and OPTION tags by following your guidelines.
      Can we remove the text of a link similarly?

      I'm overriding the getText() of LinkTag to remove the Text corresponding to Anchor tags. But it is not eliminating.

      What can I do?

      Regards,
      Suresh N.

       
      • Derrick Oswald

        Derrick Oswald - 2006-09-21

        I don't think the getText() method is used for the toHtml() output. If you look at the way toHtml() is implemented, you should be able to override it to generate what you want in a similar way to the OPTION and SELECT tags.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.