Virastyar is an spell checker for Persian Language
Virastyar is a free and open-source (FOSS) spell checker for Persian. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for Persian text processing. Contributors: Omid Kashefi Azadeh Zamanifar Masoumeh Mashaiekhi Meisam Pourafzal Reza Refaei (former member) Mohammad Hedayati (former member) Kamiar Kanani (former member) Mehrdad Senobari (former member) Sina Iravanin (former member) Mohammad Sadegh Rasooli (former member) Mohsen Hoseinalizadeh (former member) Mitra Nasri (former member) Alireza Dehlaghi (former member) Fatemeh Ahmadi (former member) Banafshe Bakhtiari (former member) Neda Pourmorteza (former member)
Quantitative Content Analysis or Text Mining
KH Coder is a free software for quantitative content analysis or text data mining. It is also utilized for computational linguistics. You can analyze Japanese, English, French, German, Italian, Portuguese and Spanish text with KH Coder. Also, Catalan, Chinese (simplified), Korean, Russian and Slovenian language data can be analyzed with the latest alpha release (Version 3). KH Coder provides various kinds of search and statistical analysis functions using back-end tools such as Stanford POS Tagger, FreeLing, Snowball stemmer, MySQL and R.
The free and open-source rule-based machine translation platform
Apertium is a toolbox to build open-source shallow-transfer machine translation systems, especially suitable for related language pairs: it includes the engine, maintenance tools, and open linguistic data for several language pairs.
Arabic Text Vocalization system
Automatic system of vocalization of arabic text.
Artha is a handy thesaurus based on WordNet with distinct features like global hotkey look-up, passive desktop notifications, regular expression based search, etc.. Artha may be used as a free open-source replacement to the proprietary WordWeb Pro.
Indexing and query tools for very large text corpora
The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP, which can be used interactively in a terminal session, as a backend e.g. from a Perl script, or through the Web-based GUI CQPweb.
Open data for a Khmer language corpus and lexicographic data that can be used for the development of free language tools for Khmer language, such as automatic translators, dictionaries, linguistic analysis tools, etc.
IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"
Free translating dictionaries. Source format: TEI-P5 XML. Delivery formats: DICT, Stardict, etc. The dictionaries may include information on the pronunciation, etymology and such, in a platform-independent format. Access: web/plugins/standalone.
Subtitle translator from one natural language to other.
Translating subtitles in format SubRip from one natural language to other. It is based on Google Translate without API and therefore without payment. Translator have automatic and manual spell checkers.
Unicode-XML-TEI text/corpus analysis platform
TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7 TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
Quran Search Engine API
Alfanous (The Lantern - الفانوس ) is an Arabic search engine API provide the simple and advanced search in the Holy Quran , more features and many interfaces...
Arabic voice files for eSpeak system
Arabic files and voices for eSpeak Text to speech system, المنطيق : ملفات اللغة العربية لبرنامج توليد الكلام من النص إسبيك
WordNet Database in various SQL format
Weka wrapper for the SGM toolkit for text classification and modeling.
Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. Other functions are usable through either Java command-line commands or class inclusion into Java projects.
the intelligent predictive text entry platform
Presage (formerly Soothsayer) is an intelligent predictive text entry system. Presage generates predictions by modelling natural language as a combination of redundant information sources. Presage computes probabilities for words which are most likely to be entered next by merging predictions generated by the different predictive algorithms. Presage's modular and extensible architecture allows its language model to be extended and customized to utilize statistical, syntactic, and semantic predictive algorithms. Presage's predictive capabilities are implemented by predictive plugins. Predictive plugins use services provided by the platform to implement multiple prediction techniques.
The study environment of ancient languages (Coptic, Greek, Latin)
Marcion is a software forming a study environment of ancient languages (esp. Coptic, Greek, Latin) and providing many tools and resources (dictionaties, grammars, texts). Although Marcion is focused on to study the gnosticism and early christianity, it is an universal library working with various file formats and allowing to collect, organize and backup texts of any kind. Overview of gnostic sources in Coptic language delivered with Marcion: Nag Hammadi Library; Berlin Codex; Codex Tchacos (Gospel of Judas); Askew Codex (Pistis Sophia); Bruce Codex (Books of Jeu) Overview of sources of early christianity in Coptic, Greek and Latin languages: Septuagint (LXX); Greek New Testament; Coptic New Testament (Sahidic, Bohairic); Latin Vulgate
Better PO Editor is an editor for .po files, used to generate compiled gettext .mo files which are used by many programs and websites to localize the user interface. It offers great features... It's worth to give it a try! PLEASE NOTE: the project moved to GitHub: see https://github.com/mlocati/betterpoeditor/releases
The application Dictionary System (DS) is a web application designed for creation of one-way bilingual dictionaries or encyclopaedias offering a working environment for creation of a dictionary and a web page which enables the general public to search in the dictionary. It is so-called DWS application (Dictionary Writing System) or DPS (Dictionary Production / Publishing System). Aplikace Dictionary System (dále DS) je webová aplikace. Je to tzv. DWS aplikace (Dictionary Writing System) nebo také tvz. DPS (Dictionary Production/Publishing System). Aplikace Dictionary System nabízí pracovní prostředí pro tvorbu jednosměrných dvojjazyčných slovníků nebo encyklopedií a webové stránky, které umožňují vyhledávat ve slovníku široké veřejnosti.
XML-Print: typesetting arbitrary XML documents in high quality
"XML-Print" is a joint project of the FH Worms (Prof. Marc W. Küster) and the University of Trier (Prof. Claudine Moulin) with support from TU Darmstadt (Prof. Andrea Rapp). Its goal is the creation of a XML formatter designated especially for the needs of the “Digital Humanties”. The project is funded by the DFG. Please visit https://sites.google.com/a/budabe.eu/xmlprint_de/kontakt and let us know, what you think about XML-Print – Does it meet your expectations? – What is missing? – Do you use it regularly? Thank you.
TML is a Java Library for LSA and extracting Concept Maps from text
TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml
Parsing Korean words by morpheme and part-of-speech
RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.
Linking Language to Knowledge with Distributional Semantics
JobimText is a software solution for automatic text expansion using contextualized distributional similarity. It provides text analysis tools for large corpora and has capabilities to create distributional semantic models (JoBimText models) and multi-word expressions.
Grammar-multi is most useful for languages which words have many forms («more» inflected languages), and for which grammatical agreement (and other syntactic connections) in a sentence is «more» important and «obvious». Need a help of linguists. Program is not for every-day use, but to show Grammar is working. If you want your language Grammar version - tell me.
Cross-platform application aimed at helping users to learn vocabulary from any foreign language(s). Add/Edit/Delete vocab words (w/ translation, category, sentence, notes, picture). Review (Quiz) vocabulary words.