encoding free download

TEI LingSIG

Production space for the TEI Linguistics SIG

This used to be the experimentation and production space for the Special Interest Group (SIG) of the Text Encoding Initiative (TEI) called "TEI for Linguists", LingSIG for short. Currently, this is a storage place for documents produced by the SIG. Use https://github.com/LingSIG to access the current production space.

Downloads: 13 This Week

Last Update: 2026-06-17

See Project

UnsupervisedMT

Phrase-Based & Neural Unsupervised Machine Translation

...The neural component supports multiple architectures—seq2seq, biLSTM with attention, and Transformer—and allows extensive parameter sharing across languages to improve data efficiency. Training relies on denoising auto-encoding and back-translation, with on-the-fly, multithreaded generation of synthetic parallel data to continually refresh supervision signals. The project also provides scripts to fetch and preprocess monolingual data, learn BPE codes, and train cross-lingual embeddings that bootstrap unsupervised alignment between languages. Beyond the core EMNLP 2018 setup, the codebase exposes additional, optional capabilities such as multi-language training, language model pretraining with shared parameters, and adversarial training.

Downloads: 1 This Week

Last Update: 2 days ago

See Project

Encode Arabic

Encode Arabic provides tools for encoding and decoding Arabic in Haskell, Python, Perl, or LaTeX. Interprets the ArabTeX notation to generate original orthography or phonetic transcription. Supports Buckwalter and other romanizations. Converts legacy byte encodings into Unicode. http://github.com/otakar-smrz/encode-arabic

1 Review

Downloads: 0 This Week

Last Update: 2016-06-28

See Project

Aelius Brazilian Portuguese POS-Tagger

Python, NLTK-based package for shallow parsing of Brazilian Portuguese

...It also includes language resources such as language models, sample texts, and gold standards. Presently, Aelius already offers facilities for POS-tagging and chunking corpora and outputting annotations in different formats, such as in XML in the TEI P5 encoding scheme.

1 Review

Downloads: 0 This Week

Last Update: 2014-11-03

See Project

CRFSharp

CRFSharp is a .NET(C#) implementation of Conditional Random Field

...CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. ...

Downloads: 0 This Week

Last Update: 2015-08-03

See Project

Search Results for "encoding"

5 projects for "encoding" with 2 filters applied:

TEI LingSIG

UnsupervisedMT

Encode Arabic

Aelius Brazilian Portuguese POS-Tagger

CRFSharp

Search Results for "encoding"

5 projects for "encoding" with 2 filters applied:

TEI LingSIG

UnsupervisedMT

Encode Arabic

Aelius Brazilian Portuguese POS-Tagger

CRFSharp

Related Searches

Related Categories