Unicode-XML-TEI text/corpus analysis platform
TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in.
It offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org).
Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en.
- Provides qualitative analysis tools : concordancer of lexical patterns based on word & structure level queries, rich HTML based text editions navigation, patterns occurrences layout display
- Provides quantitative analysis tools : factorial correspondance analysis, constrative word specificities, hierarchical classification, cooccurrents of patterns
- Works on any collection of Unicode encoded documents of various formats: texts collections (TXT, XML, XML-TEI P5), recordings transcriptions (XML-Transcriber), aligned corpora (XML-TMX), press articles (XML-PPS Factiva, Europress) and more.
- Applies various NLP tools on the fly on texts before analysis (e.g. TreeTagger for lemmatization and pos tagging)
- Allows to build various subcorpora and partitions (for constrative analysis between text structures or groups of words)
- Exports any result in CSV, XML or SVG format
- Script drivable for repetitive tasks automation or platform extension (in Groovy/Java)
- Includes a text editor to edit data sources, results and scripts
- Runs as standalone Windows, Mac OS X or Linux application
- Runs also as portal web application to access and analyze corpora online through a web browser (with access control management)
- Open source: based on the best open source components for text analysis: CQP, R and Java & XSLT libraries
- Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components is used by all the applications
- Efficient Eclipse or Netbeans powered development framework
txm works perfectly, thanks
good work, thanx!
very good project, thanks!