Gamera is a framework for the creation of structured document analysis applications by domain experts. It combines a programming library with GUI tools for the training and interactive development of recognition systems.
A machine learning system for supervised document classification
An open source system for supervised document classification based on statistical machine learning techniques.
On the contrary of the state of art classification techniques, MyNook just requires the title of the document, not the content itself.