We describe a simple XML format to share text documents and annotation
A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented.
Project files contain:
- simple code to hold/read/write data and perform sample processing.
- BioC-formatted corpora
- BioC tools that work with BioC corpora
BioC goals
- simplicity
- interoperability
- broad use
- reuse
There should be little investment required to learn to use a format or a software module to process that format. ...
Python, NLTK-based package for shallow parsing of Brazilian Portuguese
Aelius is an ongoing open source project aiming at developing a suite of Python, NLTK-based modules and interfaces to external freely available tools for shallow parsing of Brazilian Portuguese. It also includes language resources such as language models, sample texts, and gold standards. Presently, Aelius already offers facilities for POS-tagging and chunking corpora and outputting annotations in different formats, such as in XML in the TEI P5 encoding scheme.