1) Strip all javascript (including many of the tricks used by cross-site scripters)... see http://ha.ckers.org/xss.html
2) Strip out all html tags and replace them with underscores
so <a href="..">This is a test</a> becomes
_____________This is a test____
3) Strip out specific tags like iframes.
4) Is fast and easy to use.
5) Are there any "ready made" examples to do some or all of the above.
Thanks
Gavin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) Script tags are identified if the ScriptTag is registered, which is the default. Running
removeAllNodesThatMatch(new NodeClassFilter (ScriptTag.class));
should work.
2) This would have to be coded by you. Normally people just want the text from the page, which you can get with the StringBean class.
3) This is like stripping script tags, but the children of the tags must be kept. Again, custom code.
4) Should be. Try it.
5) There are examples in terms of applications and unit tests.
6) Yes it's robust. Otherwise the 1500 people who download it each week would complain.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) Strip all javascript (including many of the tricks used by cross-site scripters)... see http://ha.ckers.org/xss.html
2) Strip out all html tags and replace them with underscores
so <a href="..">This is a test</a> becomes
_____________This is a test____
3) Strip out specific tags like iframes.
4) Is fast and easy to use.
5) Are there any "ready made" examples to do some or all of the above.
Thanks
Gavin
..... and one more thing...
6) Able to handle badly formatter html and javascript
That is, be quote robust when parsing the html.
1) Script tags are identified if the ScriptTag is registered, which is the default. Running
removeAllNodesThatMatch(new NodeClassFilter (ScriptTag.class));
should work.
2) This would have to be coded by you. Normally people just want the text from the page, which you can get with the StringBean class.
3) This is like stripping script tags, but the children of the tags must be kept. Again, custom code.
4) Should be. Try it.
5) There are examples in terms of applications and unit tests.
6) Yes it's robust. Otherwise the 1500 people who download it each week would complain.