HTML Parser / Discussion / Help: Needs to remove the Option Tags.

Suresh Setty - 2006-09-11

Hi All,
I'm a new user to the HTMLParser.
In my application I needs to remove the Script Tag and its content as well as Select Tag and its Child Tag OPTION.
And then I've to parse the page.
For that I'm making use of
ScriptScanner.STRICT=false;
to remove the Script content.
But I'm not aware of how to remove the Select as well as Option tags and their tags.

Can anyone helps me...

Thanks & REgards,
Suresh N.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2006-09-11
  
  You can probably just omit the output of a tag with a custom (redefined) tag, like so:
  
  class MySelectTag extends SelectTag
  {   // override toHtml to return nothing
      String toHtml (boolean verbatim)
      { return (""); }
  }
  
  PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
  factory.registerTag (new MySelectTag ());
  parser.setNodeFactory (factory);
  
  Then when you convert the page back to HTML, the select tags should be eliminated:
  
  NodeList all_nodes = parser.parse (null);
  System.out.println (all_nodes.toHtml ());
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Suresh Setty - 2006-09-12
  
  Thanks Mr.Derrick.
  This is working as if I've done, as you specified. But for extracting the data from a page, I'm using the StringExtractor class. Then it is not removing the content of the SELECT and the OPTION tags.
  
  If needed, i'll submit the code snippet also.
  
  Thanks & Regards,
  Suresh N.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Suresh Setty - 2006-09-21
  
  Thank you Derrick.
  Now I've removed the SELECT and OPTION tags by following your guidelines.
  Can we remove the text of a link similarly?
  
  I'm overriding the getText() of LinkTag to remove the Text corresponding to Anchor tags. But it is not eliminating.
  
  What can I do?
  
  Regards,
  Suresh N.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Derrick Oswald - 2006-09-21
    
    I don't think the getText() method is used for the toHtml() output. If you look at the way toHtml() is implemented, you should be able to override it to generate what you want in a similar way to the OPTION and SELECT tags.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Needs to remove the Option Tags.

Forums

Help

Needs to remove the Option Tags. document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Needs to remove the Option Tags.