Features for the current release
- Summarization (Luhn, Lexical) + Web service | Completed
- Multi document Summarization + Web service | Completed
- Summarization evaluation | Completed
- Web service for language detection | Completed
- C99 (Segmentation) | Completed
- Redesign persistence using Berkeley DB (instead of JPA) | Completed
- NER (Natural Entity Recognition) + Web service | Completed
- Publish demo on Amazon | Completed
- Web service for text categorization (train a model)
- "Mavenize" the workspace
Then when the feature are "freezed" comes a QA-Phase with code metrics, test coverage, TODOs and Javadoc. Parallel to that phase the documentation (user guide, getting startet) will be improved.
Features for the next releases
- Support for other languages (Italian, German)
- The persistence engine should be pluggable to allow other databases (for example, Hadoop, Mongo, etc..)
- For the repository use the Content Repository Specification (JSR 170 and 283). See Apache Jackrabbit.
- Plagiarist detection
- Sentiment Analysis
- Summarization (Konchady)
- Segmentation(Text Tiling ?)
- Artificial Neural Network (ANN) classifier
- Naive Bayes (NB) classifier
- Support vector machines (SVM)
- Storage engine pluggable (loosely coupled with interfaces)