- Document indexing and selection using Apache's Lucene
- Fast VSM generation with several local and global weights (term - doc matrix)
- Dimensionality reduction using SVD or NMF for LSA or related.
- Meta-data annotators (PennTree grammar parsing).
- Operations: Document distances, topic clustering, keyword extraction, and many more!
Nice and simple.
Fast and simple.