- Document indexing and selection using Apache's Lucene
- Fast VSM generation with several local and global weights (term - doc matrix)
- Dimensionality reduction using SVD or NMF for LSA or related.
- Meta-data annotators (PennTree grammar parsing).
- Operations: Document distances, topic clustering, keyword extraction, and many more!
Fast and simple.
It seems to be good, but there are some errors that dont let the program load correctly the library ( Abstract Annotator constructor receives parameters but PennTreeAnnotator doesnt receive)
very good library for doing text mining