TXM Icon


Unicode-XML-TEI text/corpus analysis platform

5.0 Stars (4)
201 Downloads (This Week)
Last Update:
Download mpt-src.zip
Browse All Files
Windows Mac Linux



TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in.

It offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org).

Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.

Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en.

TXM Web Site


  • Provides qualitative analysis tools : concordancer of lexical patterns based on word & structure level queries, rich HTML based text editions navigation, patterns occurrences layout display
  • Provides quantitative analysis tools : factorial correspondance analysis, constrative word specificities, hierarchical classification, cooccurrents of patterns
  • Works on any collection of Unicode encoded documents of various formats: texts collections (TXT, XML, XML-TEI P5), recordings transcriptions (XML-Transcriber), aligned corpora (XML-TMX), press articles (XML-PPS Factiva, Europress) and more.
  • Applies various NLP tools on the fly on texts before analysis (e.g. TreeTagger for lemmatization and pos tagging)
  • Allows to build various subcorpora and partitions (for constrative analysis between text structures or groups of words)
  • Exports any result in CSV, XML or SVG format
  • Script drivable for repetitive tasks automation or platform extension (in Groovy/Java)
  • Includes a text editor to edit data sources, results and scripts
  • Runs as standalone Windows, Mac OS X or Linux application
  • Runs also as portal web application to access and analyze corpora online through a web browser (with access control management)
  • Open source: based on the best open source components for text analysis: CQP, R and Java & XSLT libraries
  • Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components is used by all the applications
  • Efficient Eclipse or Netbeans powered development framework


User Ratings

ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
Write a Review

User Reviews

There are no 2 star reviews.

Additional Project Details


French, English, Russian

Intended Audience

Science/Research, Advanced End Users, Developers, End Users/Desktop

User Interface

Java SWT, Web-based, Console/Terminal, Eclipse

Programming Language

C, Groovy, Java, S/R



Thanks for helping keep SourceForge clean.

Screenshot instructions:
Red Hat Linux   Ubuntu

Click URL instructions:
Right-click on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Briefly describe the problem (required):

Upload screenshot of ad (required):
Select a file, or drag & drop file here.

Please provide the ad click URL, if possible:

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks