omar - 2013-06-14

I am an intern developping an application that should discover website defacements. My application should calculate a metric that would be compared to a certain threshold used to discover defacements. Of course, urls, images and other stuff are stripped off and treated separately. I am only interested in the text.
I would like to know a good metric that I could use. I will monitor about 150 pages of 3 to 6 KB each. I couldn't find any free dedicated tool that does this.

Thank you for your help.