Download Latest Version Preparation et Import dans TXM 2019.zip (1.2 MB)
Email in envelope

Get an email when there's a new version of TXM

Home / corpora
Name Modified Size InfoDownloads / Week
Parent folder
fleurs-du-mal 2021-03-25
p1s8-course-transcription 2020-03-04
voeux 2019-05-29
brown 2018-12-10
txm-odt-manual 2018-01-10
quete-du-graal-tei 2017-11-03
discours 2017-11-03
leviathan 2017-05-18
tdm80j 2017-05-07
mpt 2016-10-27
CORPUS110CYL067 2016-07-21
uno-tmx-sample 2016-07-14
voeux-rfa 2014-06-26
voeux-fr 2014-02-11
README.markdown 2018-12-11 3.2 kB
Totals: 15 Items   3.2 kB 13

Sample Corpora

Go to a corpus directory, download the .txm binary file, and then call the 'File > Load' command in TXM to load it.

For some corpora the sources are also provided, so you can also import from the sources and tune the corpus.

Written texts

French

  • discours: corpus of various French presidents’ speeches, published by Damon Mayaffre.
  • fleurs-du-mal: Les Fleurs du mal (The Flowers of Evil) by Charles Baudelaire, edition of Jean-Marie Viprey.
  • mpt: corpus of French National Assembly debates on the "Mariage pour tous" law of 2013 from the mariagepourtousInXML project.
  • quete-du-graal-tei: Queste del Saint Graal (Quest for the Holy Grail), edition of Christiane-Marchello Nizia and Alexei Lavrentiev, based on 'Lyon, Palais des Arts 77 (ms. K) (fol. 160a-224d)' and 'Paris, BNF n. acq. fr. 1119 (ms. Z)' ca. 1225 or 1230 Old French manuscripts.
  • tdm80j: Le tour du monde en quatre-vingts jours (Around the World in Eighty Days), Jules Verne, 1873, edition of J. Hetzel et Cie. Synoptic edition with Wikisource facsimile images.
  • txm-odt-manual: TXM User's manual as a TXM corpus.
  • voeux: See voeux-fr.
  • voeux-fr: corpus of 1959-2009 New Year’s Day 51 speeches of French presidents, published by Jean-Marc Leblanc.

English

  • brown: corpus of 500 texts written in American English in 1961, published by W. N. Francis et H. Kucera (this version based on the XML TEI version of NLTK project).
  • leviathan: Leviathan by Thomas Hobbes, 1588-1679. XML-TEI P5 text sample from the EEBO-TCP Phase 1 project.

German

  • voeux-rfa: corpus of the Christmas and the New Year's addresses delivered by the Presidents and the Chancellors of the Federal Republic of Germany since 1987, contributed by Sascha Diwersy, Universität zu Köln.

Record transcriptions (synchronized)

  • p1s8-course-transcription: French Speech transcription and recording of a high school course of physics, Tiberghien Andrée et al., Education & didactique 3/ 2012 (vol.6). To practice video replay from concordances (needs Media Player extension).

Parallel corpora (multilingual)

  • uno-tmx-sample: sample of United Nations General Assembly Resolutions: A Six-Language Parallel Corpus (Arabic, Chinese, English, French, Russian and Spanish), http://www.uncorpora.org [Alexandre Rafalovitch, Robert Dale. 2009. United Nations General Assembly Resolutions: A Six-Language Parallel Corpus. In Proceedings of the MT Summit XII, pages 292-299, Ottawa, Canada, August]. To import with the XML-TMX import module.

Annotated corpora

  • CORPUS110CYL067: a single syntactically parsed text from the MASC corpus. To practice TIGER Search queries (see TIGER-XML import validation, needs TIGER Search extension).

Some corpora are also available from the TXM demo portal: http://portal.textometrie.org/demo/?locale=en.

Source: README.markdown, updated 2018-12-11