text-analysis Wiki

Status: Alpha

Brought to you by: kostia76

Roadmap

Features for the current release

Summarization (Luhn, Lexical) + Web service | Completed
Multi document Summarization + Web service | Completed
Summarization evaluation | Completed
Web service for language detection | Completed
C99 (Segmentation) | Completed
Redesign persistence using Berkeley DB (instead of JPA) | Completed
NER (Natural Entity Recognition) + Web service | Completed
Publish demo on Amazon | Completed
Web service for text categorization (train a model)
"Mavenize" the workspace

Then when the feature are "freezed" comes a QA-Phase with code metrics, test coverage, TODOs and Javadoc. Parallel to that phase the documentation (user guide, getting startet) will be improved.

Features for the next releases

Support for other languages (Italian, German)
The persistence engine should be pluggable to allow other databases (for example, Hadoop, Mongo, etc..)
For the repository use the Content Repository Specification (JSR 170 and 283). See Apache Jackrabbit.
Plagiarist detection
Sentiment Analysis
Summarization (Konchady)
Segmentation(Text Tiling ?)
Artificial Neural Network (ANN) classifier
Naive Bayes (NB) classifier
Support vector machines (SVM)
Storage engine pluggable (loosely coupled with interfaces)

Wiki: Home

text-analysis Wiki

Roadmap

Features for the current release

Features for the next releases

Related