TexLexAn is an open source text analyser for Linux, able to estimate the readability and reading time, to classify and summarize texts. It has some learning abilities and accepts html, doc, pdf, ppt, odt and txt documents. Written in C and Python.
A machine learning system for supervised document classification
An open source system for supervised document classification based on statistical machine learning techniques.
On the contrary of the state of art classification techniques, MyNook just requires the title of the document, not the content itself.