Best metric for HTML pages

omar
2013-06-14
2013-06-14
  • omar
    omar
    2013-06-14

    Hi,
    I am an intern developping an application that should discover website defacements. My application should calculate a metric that would be compared to a certain threshold used to discover defacements. Of course, urls, images and other stuff are stripped off and treated separately. I am only interested in the text.
    I would like to know a good metric that I could use. I will monitor about 150 pages of 3 to 6 KB each. I couldn't find any free dedicated tool that does this.

    Thank you for your help.