TexLexAn is an open source text analyser for Linux, able to estimate the readability and reading time, to classify and summarize texts. It has some learning abilities and accepts html, doc, pdf, ppt, odt and txt documents. Written in C and Python.
- GUI and CLI to Analyze, classify and summarize document
- Accept: text, html, odt, msdoc, ppt, ps
- Analyze: syllables/word distr., readability, sentiments
- Sentiment: Evaluate bipolar sentiments
- Extract: keywords
- Classify: linear classifier unigram...n-gram based
- Summarize: extract relevant sentences and simplify them.
- Learn: perceptron algorithm
- Retrieve original docs by searching in archived summaries.
- Classify & extract sentences from previous summaries
- Detect: English, French, German, Italian, Spanish languages