Menu

Can HTMLParser do this...

Help
Gavin Las
2005-06-22
2013-04-27
  • Gavin Las

    Gavin Las - 2005-06-22

    1) Strip all javascript (including many of the tricks used by cross-site scripters)... see http://ha.ckers.org/xss.html
    2) Strip out all html tags and replace them with underscores
    so <a href="..">This is a test</a> becomes
    _____________This is a test____
    3) Strip out specific tags like iframes.

    4) Is fast and easy to use.

    5) Are there any "ready made" examples to do some or all of the above.

    Thanks
    Gavin

     
    • Gavin Las

      Gavin Las - 2005-06-22

      ..... and one more thing...
      6) Able to handle badly formatter html and javascript
      That is, be quote robust when parsing the html.

       
    • Derrick Oswald

      Derrick Oswald - 2005-06-22

      1) Script tags are identified if the ScriptTag is registered, which is the default.  Running
          removeAllNodesThatMatch(new NodeClassFilter (ScriptTag.class));
      should work.

      2) This would have to be coded by you. Normally people just want the text from the page, which you can get with the StringBean class.

      3) This is like stripping script tags, but the children of the tags must be kept. Again, custom code.

      4) Should be. Try it.

      5) There are examples in terms of applications and unit tests.

      6) Yes it's robust. Otherwise the 1500 people who download it each week would complain.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.