Virastyar is an spell checker for Persian Language
Virastyar is a free and open-source (FOSS) spell checker for Persian. It stands upon the shoulders of many free/libre/open-source (FLOSS) libraries developed for Persian text processing. Contributors: Omid Kashefi Azadeh Zamanifar Masoumeh Mashaiekhi Meisam Pourafzal Reza Refaei (former member) Mohammad Hedayati (former member) Kamiar Kanani (former member) Mehrdad Senobari (former member) Sina Iravanin (former member) Mohammad Sadegh Rasooli (former member) Mohsen Hoseinalizadeh (former member) Mitra Nasri (former member) Alireza Dehlaghi (former member) Fatemeh Ahmadi (former member) Banafshe Bakhtiari (former member) Neda Pourmorteza (former member)
Quantitative Content Analysis or Text Mining
KH Coder is a free software for quantitative content analysis or text data mining. It is also utilized for computational linguistics. You can analyze Japanese, English, French, German, Italian, Portuguese and Spanish text with KH Coder. Also, Catalan, Chinese (simplified), Korean, Russian and Slovenian language data can be analyzed with the latest alpha release (Version 3). KH Coder provides various kinds of search and statistical analysis functions using back-end tools such as Stanford POS Tagger, FreeLing, Snowball stemmer, MySQL and R.
Arabic Text Vocalization system
Automatic system of vocalization of arabic text.
The free and open-source rule-based machine translation platform
Apertium is a toolbox to build open-source shallow-transfer machine translation systems, especially suitable for related language pairs: it includes the engine, maintenance tools, and open linguistic data for several language pairs.
Artha is a handy thesaurus based on WordNet with distinct features like global hotkey look-up, passive desktop notifications, regular expression based search, etc.. Artha may be used as a free open-source replacement to the proprietary WordWeb Pro.
IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"
Free translating dictionaries. Source format: TEI-P5 XML. Delivery formats: DICT, Stardict, etc. The dictionaries may include information on the pronunciation, etymology and such, in a platform-independent format. Access: web/plugins/standalone.
Subtitle translator from one natural language to other.
Translating subtitles in format SubRip from one natural language to other. It is based on Google Translate without API and therefore without payment. Translator have automatic and manual spell checkers.
Natural Language Compiler
Unicode-XML-TEI text/corpus analysis platform
TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
Phần mềm dịch tiếng Anh - Việt trên Word, Excel, PDF, web
Phần mềm dịch tiếng Anh - tiếng Việt miễn phí. Bạn có thể dịch trực tiếp văn bản trên website bất kỳ, hoặc nhập văn bản cần dịch. Để kết quả dịch được chính xác, bạn nên dịch theo cụm từ hoặc từng câu. Bạn chỉ cần nhấn đúp chuột vào một từ hoặc dùng chuột để đánh dấu một đoạn văn bản khi đang lướt web để thấy kết quả dịch. Phần mềm có thể dịch tiếng Anh sáng tiếng Việt hoặc tiếng Việt sang tiếng Anh. Yêu cầu: cài đặt .Net Framework
Quran Search Engine API
Alfanous (The Lantern - الفانوس ) is an Arabic search engine API provide the simple and advanced search in the Holy Quran , more features and many interfaces...
the intelligent predictive text entry platform
Presage (formerly Soothsayer) is an intelligent predictive text entry system. Presage generates predictions by modelling natural language as a combination of redundant information sources. Presage computes probabilities for words which are most likely to be entered next by merging predictions generated by the different predictive algorithms. Presage's modular and extensible architecture allows its language model to be extended and customized to utilize statistical, syntactic, and semantic predictive algorithms. Presage's predictive capabilities are implemented by predictive plugins. Predictive plugins use services provided by the platform to implement multiple prediction techniques.
Turku Event Extraction System
Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.
The study environment of ancient languages (Coptic, Greek, Latin)
Marcion is a software forming a study environment of ancient languages (esp. Coptic, Greek, Latin) and providing many tools and resources (dictionaties, grammars, texts). Although Marcion is focused on to study the gnosticism and early christianity, it is an universal library working with various file formats and allowing to collect, organize and backup texts of any kind. Overview of gnostic sources in Coptic language delivered with Marcion: Nag Hammadi Library; Berlin Codex; Codex Tchacos (Gospel of Judas); Askew Codex (Pistis Sophia); Bruce Codex (Books of Jeu) Overview of sources of early christianity in Coptic, Greek and Latin languages: Septuagint (LXX); Greek New Testament; Coptic New Testament (Sahidic, Bohairic); Latin Vulgate
Indexing and query tools for very large text corpora
The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP, which can be used interactively in a terminal session, as a backend e.g. from a Perl script, or through the Web-based GUI CQPweb.
Open data for a Khmer language corpus and lexicographic data that can be used for the development of free language tools for Khmer language, such as automatic translators, dictionaries, linguistic analysis tools, etc.
Arabic voice files for eSpeak system
Arabic files and voices for eSpeak Text to speech system, المنطيق : ملفات اللغة العربية لبرنامج توليد الكلام من النص إسبيك
Weka wrapper for the SGM toolkit for text classification and modeling.
Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. Other functions are usable through either Java command-line commands or class inclusion into Java projects.
WordNet Database in various SQL format
XML-Print: typesetting arbitrary XML documents in high quality
"XML-Print" is a joint project of the FH Worms (Prof. Marc W. Küster) and the University of Trier (Prof. Claudine Moulin) with support from TU Darmstadt (Prof. Andrea Rapp). Its goal is the creation of a XML formatter designated especially for the needs of the “Digital Humanties”. The project is funded by the DFG. Please visit https://sites.google.com/a/budabe.eu/xmlprint_de/kontakt and let us know, what you think about XML-Print – Does it meet your expectations? – What is missing? – Do you use it regularly? Thank you.
MyVocabtionary allows you to create an online dictionary for free!
MyVocabtionary (formerly phpVocabtionary) is a free PHP/MySQL-based web software that allows you to create a free dictionary. With our vast number of modifications, you can also make your dictionary even better! Download is completely free. Usage is a piece of cake and you can customise almost everything through a user-friendly GUI. Creating an online dictionary has never been that easy!
Better PO Editor is an editor for .po files, used to generate compiled gettext .mo files which are used by many programs and websites to localize the user interface. It offers great features... It's worth to give it a try! PLEASE NOTE: the project moved to GitHub: see https://github.com/mlocati/betterpoeditor/releases
Parsing Korean words by morpheme and part-of-speech
RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.
A JAVA class with a small functionality that is stemming Arabic words
A JAVA Arabic stemmer that is based on Shereen Khoja algorithm. This java class offers a function called stemWrod which takes an arabic word and return the stem of it.