Artha is a handy thesaurus based on WordNet with distinct features like global hotkey look-up, passive desktop notifications, regular expression based search, etc.. Artha may be used as a free open-source replacement to the proprietary WordWeb Pro.
This project includes basic NLP and DSP techniques for Text-to-Speech
See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.
Cunei is a data-driven machine translation system that builds dynamic, statistical models based on instances of known translations found in a corpus.
This Program helps to translate words and text via Google Translate.
This work consists in building a Java Text mining toolkit based on what exists nowadays in the field of knowledge discovery from data collections.
The study environment of ancient languages (Coptic, Greek, Latin)
Marcion is a software forming a study environment of ancient languages (esp. Coptic, Greek, Latin) and providing many tools and resources (dictionaties, grammars, texts). Although Marcion is focused on to study the gnosticism and early christianity, it is an universal library working with various file formats and allowing to collect, organize and backup texts of any kind. Overview of gnostic sources in Coptic language delivered with Marcion: Nag Hammadi Library; Berlin Codex; Codex Tchacos (Gospel of Judas); Askew Codex (Pistis Sophia); Bruce Codex (Books of Jeu) Overview of sources of early christianity in Coptic, Greek and Latin languages: Septuagint (LXX); Greek New Testament; Coptic New Testament (Sahidic, Bohairic); Latin Vulgate
Pylero is an open-source Python-based text generator.
Logic in language that scientists haven't described yet
• Despite centuries of exhaustive research, the theory of evolution still hasn't provided a satisfying explanation for the origin of intelligence and language; • According to the biblical world view, God has created laws of nature. Being based on the Laws of Intelligence embedded in Grammar, only Thinknowlogy implements the natural meaning (intelligent function) of words like definite article “the”, conjunction “or”, possessive verb “has/have” and past tense verbs “was/were” and “had”. It is demonstrated by: • Programming in natural language; • Reasoning in natural language: - drawing conclusions (more advanced than scientific solutions), - making assumptions (with self-adjusting level of uncertainty), - asking questions (about gaps in the knowledge), - detecting conflicts and some cases of semantic ambiguity, - displaying of justification reports for the self-generated knowledge; • Multilingualism, proving: Languages have one common origin.
Virastyar is an spell checker for Persian Language
Virastyar is a free and open-source (FOSS) spell checker for Persian. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for Persian text processing. Contributors: Omid Kashefi Azadeh Zamanifar Masoumeh Mashaiekhi Meisam Pourafzal Reza Refaei (former member) Mohammad Hedayati (former member) Kamiar Kanani (former member) Mehrdad Senobari (former member) Sina Iravanin (former member) Mohammad Sadegh Rasooli (former member) Mohsen Hoseinalizadeh (former member) Mitra Nasri (former member) Alireza Dehlaghi (former member) Fatemeh Ahmadi (former member) Banafshe Bakhtiari (former member) Neda Pourmorteza (former member)
A parallel corpora (bitext) aligning tool. Create TMX databases
(Full support available under superalign.sourceforge.net) Aligning parallel corpora Creating TMX, csv, Tab Delimited TMs Automatic aligning of text Super fast handling of multiple files Very easy GUI handling of files under Windows CAT tool assistant
IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"
Weka wrapper for the SGM toolkit for text classification and modeling.
Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. Other functions are usable through either Java command-line commands or class inclusion into Java projects.