TXM XSLT IMPORT PROCESSING LIBRARY
This is a collection of XSLT (1.0 or 2.0) stylesheets that can be used to prepare various types of XML documents for import into TXM. Place them in the appropriate xsl/step subfolder when using XTZ+CSV import module or use "Front XSLT" option in the import parameters interface to select the appropriate filter in the XML/W+CSV import module.
Filters are usually named according to the following pattern: txm-filter-[input format]-[import module](-[option])?
Stylesheets for use with the XML TEI Zero+CSV (XTZ) import module
1-split-merge step
Due to a bug in TXM 0.7.8 and 0.7.9, this processing step is not working properly. The stylesheets mentioned below must be applied prior to the import using ExecXSL macro or any other XSLT 2.0 processor.
txm-rename-files-no-dots.xsl
This stylesheet is designed for TXM XTZ+CSV import module to replace dots with underscores in source file names. (A bug in TXM 0.7.8 prevented files containing dots in their names from being imported, this bug has been resolved in TXM 0.7.9).
txm-split-teicorpus.xsl
This stylesheet may be used to split a single file containing a teiCorpus into individual files for each TEI child.
2-front step
txm-front-teiHeader2textAtt.xsl
This stylesheet may be customized to extract metadata from teiHeader and create corresponding attributes of the text element.
txm-front-teitxm2xmlw.xsl
This stylesheet may be used to import TEI-TXM XML files with XML-TEI Zero+CSV (or XML/W + CSV) module. This module is more flexible than XML-TEI TXM. It allows re-tokenizing the texts, selecting and renaming annotations, and building synoptic editions.
3-posttok step
txm-posttok-addRef.xsl
This stylesheet may be customized to add a ref attribute to w elements which will be used as a default reference in TXM concordances.
txm-posttok-unbreakWords.xsl
This stylesheet may be customized to re-unite the words broken in the primary tokenization process (due to line or page breaks, for instance)
txm-posttok-structure2wordAtt.xsl
This stylesheet projects the number of nesting selected ancestor elements to attributes of the w element.Enter element names separated by | as the value of elementsToProject parameter.
4-edition step
1-default-html.xsl
This is an alternative stylesheet for creating default editions with the XTZ module. It transforms every TEI element into an HTML span with @class. This stylesheet must be used in conjunction with 2-default-pager.xsl.
2-defaut-pager.xsl
This stylesheet must be used in conjunction with 1-default-html.xsl to create edition pages.
Basic stylesheets for filtering XML sources
filter-keep-only-select.xsl
This stylesheet may be customized to filter out all the text and tags except the content of the specified element (select by default) and its ancestors.
filter-out-p.xsl
This stylesheet may be customized to filter out any particular xml element (p by default) and its content from the source document.
filter-out-sp.xsl
This stylesheet may be customized to filter out any particular xml element with a specific attribute value (sp with an attribute who with the value 'enqueteur' by default) and its content from the source document.
Basic stylesheets for adapting XML TEI P5 sources
txm-filter-teip5-teibfm.xsl
This stylesheet may be customized for use with any TEI P5 in the TEI BFM import module. Note that this module is experimental and may fail on documents that do not follow BFM encoding guidelines.
txm-filter-teip5-xmlw-preserve.xsl
This stylesheet may be customized for use with any TEI P5 in the XML/w+CSV import module. By default, it eliminates teiHeader and facsimile elements and their contents and preserves all other elements.
txm-filter-teip5-xmlw-simplify.xsl
This stylesheet may be customized for use with any TEI P5 in the XML/W+CSV import module. By default, it eliminates teiHeader, facsimile and all note elements and their contents and filters out all tags in the text body except ab, body, div, front, lb, p, pb, s, TEI, text and w.
Additional stylesheets for particular corpora
p4top5_perseus.xsl
This stylesheet is needed to convert Perseus TEI P4 files to TEI P5 prior to any import process.
txm-edition-page-split.xsl
This styleheet should be used to create separate HTML pages for TXM editions.
txm-edition-xmltxm-textgrid.xsl
This styleheet should be used to customize TXM editions of DARIAH-DE Textgrid texts.
txm-edition-xtz-corpusakkadien-translit.xsl
This stylesheet should be used to customize translitterated TXM editions of cuneiform Akkadian tablets, see the project wiki for more details.
txm-edition-xtz-cuneiform.xsl
This stylesheet should be used to create cuneiform TXM editions of Akkadian tablets, see the project wiki for more details.
txm-filter-corpusakkadien-xmlw_syllabes-cuneiform.xsl
This stylesheet should be used to on a corpus of Akkadian tablets with the XML/w+CSV import module, see the project wiki for more details.
txm-filter-perseustreebank-xmlw.xsl
This filter should be used on the Perseus Treebank corpus texts with the XML/w+CSV import module.
txm-filter-qgraal_cm-xmlw.xsl
This styleheet should be used on the diffracted format of Quest del Saint Graal source files with the XML/w+CSV import module.
txm-filter-rnc-xmlw.xsl
This filter should be used on the Russian National Corpus texts with the XML/w+CSV import module.
txm-filter-teibrown-xmlw.xsl
This filter should be used on the TEI Brown corpus texts with the XML/w+CSV import module.
txm-filter-teibvh-xmlw.xsl
This filter should be used on the TEI BVH texts with the XML/w+CSV import module.
txm-filter-teibvh-xmlw-posttok.xsl
This styleheet should be used to fix the tokenization errors and to adjust word properties in the tokenized version of TEI BVH texts.
txm-filter-teicorpustextgrid-xmlw.xsl
This styleheet should be used to prepare DARIAH-DE TEIcorpus xml files to TXM XML/w+CSV import process.
txm-filter-teifrantext-teibfm.xsl
This filter should be used on TEI Frantext texts with the TEI BFM import module. It is automatically applied in the TEI Frantext import module. Note that this module is experimental and may fail on documents that do not follow BFM encoding guidelines.
txm-filter-teifrantext-xmlw.xsl
This styleheet should be used on TEI Frantext texts with the XML/w+CSV import module.
txm-filter-teiperseus-xmlw.xsl
This filter should be used on the TEI Perseus corpus texts with the XML/w+CSV import module (after conversion to TEI P5).
txm-filter-teitextgrid-xmlw-posttok.xsl
This styleheet should be used to adjust word properties in the tokenized version of DARIAH-DE Textgrid texts.
txm-front-idsHeader2textAtt.xsl
This stylesheet may be used to project metadata from idsHeader (Mannheim German Language Institute corpus, IDS-XCES schema) to text attributes
txm-split-xces-ids-corpus2text.xsl
This stylesheet transforms a single file of a XCES-IDS corpus (Mannheim German Language Institute corpus) into as many files as separate texts for TXM XTZ import module. Designed for 1-split-merge step, which is currently buggy. Should be applied prior to the import process.
Please address any enquiries about the TXM XSLT library to textometrie@groupes.renater.fr