TML is a Text Mining Library with a focus on LSA (Latent Semantic Analysis) tightly integrated with Apache's Lucene which focuses on ease of use for researchers and developers that want to integrate Text Mining capabilities in their applications.
- Document indexing and selection using Apache's Lucene
- Fast VSM generation with several local and global weights (term - doc matrix)
- Dimensionality reduction using SVD or NMF for LSA or related.
- Meta-data annotators (PennTree grammar parsing).
- Operations: Document distances, topic clustering, keyword extraction, and many more!
Fast and simple.
It seems to be good, but there are some errors that dont let the program load correctly the library ( Abstract Annotator constructor receives parameters but PennTreeAnnotator doesnt receive)
very good library for doing text mining
Nice and simple.