encoding free download

TEI LingSIG

Production space for the TEI Linguistics SIG

This used to be the experimentation and production space for the Special Interest Group (SIG) of the Text Encoding Initiative (TEI) called "TEI for Linguists", LingSIG for short. Currently, this is a storage place for documents produced by the SIG. Use https://github.com/LingSIG to access the current production space.

Downloads: 12 This Week

Last Update: 7 days ago

See Project

multinotes

Text architecture for music theory.

...The mulitNotes text architecture and processing pipeline is based on d2d and standard technologies (XSLT, ECMAScript. LilyPond, PostScript, etc.) and addresses these issues. An overview about the software architecture and its operation is given in: Journal of the Text Encoding Initiative, Open Issue 18/2024: "Using d2d for Writing XML --- The multiNotes Text Architecture for Musical Analysis" https://doi.org/10.4000/132ex

Downloads: 0 This Week

Last Update: 2025-02-04

See Project

wordTabulator

...It can generate index of word elements extracted from defined text set. Word elements may be words, N-grams or phrases (syntagmes). The program can process texts as in ordinary 1-byte encoding (ANSI), as in multibyte UTF-8 encoding.

1 Review

Downloads: 0 This Week

Last Update: 2020-09-19

See Project

UnsupervisedMT

Phrase-Based & Neural Unsupervised Machine Translation

...The neural component supports multiple architectures—seq2seq, biLSTM with attention, and Transformer—and allows extensive parameter sharing across languages to improve data efficiency. Training relies on denoising auto-encoding and back-translation, with on-the-fly, multithreaded generation of synthetic parallel data to continually refresh supervision signals. The project also provides scripts to fetch and preprocess monolingual data, learn BPE codes, and train cross-lingual embeddings that bootstrap unsupervised alignment between languages. Beyond the core EMNLP 2018 setup, the codebase exposes additional, optional capabilities such as multi-language training, language model pretraining with shared parameters, and adversarial training.

Downloads: 1 This Week

Last Update: 5 hours ago

See Project

Ghawwas_V4

An open source system for Arabic corpora processing

...Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format

1 Review

Downloads: 0 This Week

Last Update: 2018-12-09

See Project

Encode Arabic

Encode Arabic provides tools for encoding and decoding Arabic in Haskell, Python, Perl, or LaTeX. Interprets the ArabTeX notation to generate original orthography or phonetic transcription. Supports Buckwalter and other romanizations. Converts legacy byte encodings into Unicode. http://github.com/otakar-smrz/encode-arabic

1 Review

Downloads: 0 This Week

Last Update: 2016-06-28

See Project

Aelius Brazilian Portuguese POS-Tagger

Python, NLTK-based package for shallow parsing of Brazilian Portuguese

...It also includes language resources such as language models, sample texts, and gold standards. Presently, Aelius already offers facilities for POS-tagging and chunking corpora and outputting annotations in different formats, such as in XML in the TEI P5 encoding scheme.

1 Review

Downloads: 0 This Week

Last Update: 2014-11-03

See Project

Linux Guist - Multi Lingual OS for Asia

A Single Click Language Changer and Publishing System for Web and DTP

Linux Guist - is a Multi Lingual Live CD OS for most Asian Languages, with the ability to run of a CD & Old Hardware, with just 128 MB Memory, for DTP, Web Publishing & Data Entry purposes. This will help IT employers to take up Govt. Projects that require Data Collection, Entry & Publishing at a very very low cost, while providing Training & Job Opportunities to numerous students of these languages, in the various towns, of the country. Talk to your respective IT/HRD ministry to identify...

Downloads: 0 This Week

Last Update: 2019-03-20

See Project

CRFSharp

CRFSharp is a .NET(C#) implementation of Conditional Random Field

...CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. ...

Downloads: 0 This Week

Last Update: 2015-08-03

See Project

Nasira

Nasira is a Java library for reading text files with non-ASCII characters (e.g. documents in German, Swedish,...). To do so, it automatically determines the character encoding (iso-8859-1, utf-8) used to encode the file through user-provided hints.

Downloads: 0 This Week

Last Update: 2013-04-22

See Project

Search Results for "encoding"

Showing 10 open source projects for "encoding"

TEI LingSIG

multinotes

wordTabulator

UnsupervisedMT

Ghawwas_V4

Encode Arabic

Aelius Brazilian Portuguese POS-Tagger

Linux Guist - Multi Lingual OS for Asia

CRFSharp

Nasira

Search Results for "encoding"

Showing 10 open source projects for "encoding"

TEI LingSIG

multinotes

wordTabulator

UnsupervisedMT

Ghawwas_V4

Encode Arabic

Aelius Brazilian Portuguese POS-Tagger

Linux Guist - Multi Lingual OS for Asia

CRFSharp

Nasira

Related Searches

Related Categories