Hi All,
I'm a new user to the HTMLParser.
In my application I needs to remove the Script Tag and its content as well as Select Tag and its Child Tag OPTION.
And then I've to parse the page.
For that I'm making use of
ScriptScanner.STRICT=false;
to remove the Script content.
But I'm not aware of how to remove the Select as well as Option tags and their tags.
Can anyone helps me...
Thanks & REgards,
Suresh N.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Mr.Derrick.
This is working as if I've done, as you specified. But for extracting the data from a page, I'm using the StringExtractor class. Then it is not removing the content of the SELECT and the OPTION tags.
If needed, i'll submit the code snippet also.
Thanks & Regards,
Suresh N.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't think the getText() method is used for the toHtml() output. If you look at the way toHtml() is implemented, you should be able to override it to generate what you want in a similar way to the OPTION and SELECT tags.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi All,
I'm a new user to the HTMLParser.
In my application I needs to remove the Script Tag and its content as well as Select Tag and its Child Tag OPTION.
And then I've to parse the page.
For that I'm making use of
ScriptScanner.STRICT=false;
to remove the Script content.
But I'm not aware of how to remove the Select as well as Option tags and their tags.
Can anyone helps me...
Thanks & REgards,
Suresh N.
You can probably just omit the output of a tag with a custom (redefined) tag, like so:
class MySelectTag extends SelectTag
{ // override toHtml to return nothing
String toHtml (boolean verbatim)
{ return (""); }
}
PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
factory.registerTag (new MySelectTag ());
parser.setNodeFactory (factory);
Then when you convert the page back to HTML, the select tags should be eliminated:
NodeList all_nodes = parser.parse (null);
System.out.println (all_nodes.toHtml ());
Thanks Mr.Derrick.
This is working as if I've done, as you specified. But for extracting the data from a page, I'm using the StringExtractor class. Then it is not removing the content of the SELECT and the OPTION tags.
If needed, i'll submit the code snippet also.
Thanks & Regards,
Suresh N.
Thank you Derrick.
Now I've removed the SELECT and OPTION tags by following your guidelines.
Can we remove the text of a link similarly?
I'm overriding the getText() of LinkTag to remove the Text corresponding to Anchor tags. But it is not eliminating.
What can I do?
Regards,
Suresh N.
I don't think the getText() method is used for the toHtml() output. If you look at the way toHtml() is implemented, you should be able to override it to generate what you want in a similar way to the OPTION and SELECT tags.