A linguistic annotation store
LABB-CAT is a browser-based linguistics research tool that stores recordings and regular-expression searchable text transcripts of interviews. The search results, entire transcripts, and media, can be viewed or exported in a variety of format
A smart search engine for medical documents
TIES (Text Information Extraction System) is a clinical text search engine that uses Natural Language Processing techniques to extract medical concepts from free text clinical reports. It provides secure de-identified access to this information and has in built collaboration tools and honest broker functionality. It is licensed for academic use under the BSD license. For commercial use please contact Nexi at http://nexihub.com
Quantitative Content Analysis or Text Mining
KH Coder is a free software for quantitative content analysis or text data mining. It is also utilized for computational linguistics. You can analyze Japanese, English, French, German, Italian, Portuguese and Spanish text with KH Coder. Also, Catalan, Chinese (simplified), Korean, Russian and Slovenian language data can be analyzed with the latest alpha release (Version 3). KH Coder provides various kinds of search and statistical analysis functions using back-end tools such as Stanford POS Tagger, FreeLing, Snowball stemmer, MySQL and R.
A tool for Visual Transcriptions of biblical texts at INTF and ITSEE
The Online Transcription Editor was developed as part of the joined project "Workspace for Collaborative Editing". It is used for transcriptions at the INTF in Munster and the ITSEE in Birmingham.
Phần mềm dịch tiếng Anh - Việt trên Word, Excel, PDF, web
Phần mềm dịch tiếng Anh - tiếng Việt miễn phí. Bạn có thể dịch trực tiếp văn bản trên website bất kỳ, hoặc nhập văn bản cần dịch. Để kết quả dịch được chính xác, bạn nên dịch theo cụm từ hoặc từng câu. Bạn chỉ cần nhấn đúp chuột vào một từ hoặc dùng chuột để đánh dấu một đoạn văn bản khi đang lướt web để thấy kết quả dịch. Phần mềm có thể dịch tiếng Anh sáng tiếng Việt hoặc tiếng Việt sang tiếng Anh. Yêu cầu: cài đặt .Net Framework
R interface to the Corpus Query Protocol
Implements the Corpus Query Protocol as a package for the R statistical environment. It allows to query linguistic corpora and manipulate the data as native R objects. It is based on the CWB software.
Indexing and query tools for very large text corpora
The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP, which can be used interactively in a terminal session, as a backend e.g. from a Perl script, or through the Web-based GUI CQPweb.
The free and open-source rule-based machine translation platform
Apertium is a toolbox to build open-source shallow-transfer machine translation systems, especially suitable for related language pairs: it includes the engine, maintenance tools, and open linguistic data for several language pairs.
FlexiTerm is an open-source software tool for automatic term recognition. FlexiTerm uses a range of methods to neutralise the main sources of term variation. FlexiTerm is robust enough for less formally structured texts, such as those found in patient blogs or medical notes.
the intelligent predictive text entry platform
Presage (formerly Soothsayer) is an intelligent predictive text entry system. Presage generates predictions by modelling natural language as a combination of redundant information sources. Presage computes probabilities for words which are most likely to be entered next by merging predictions generated by the different predictive algorithms. Presage's modular and extensible architecture allows its language model to be extended and customized to utilize statistical, syntactic, and semantic predictive algorithms. Presage's predictive capabilities are implemented by predictive plugins. Predictive plugins use services provided by the platform to implement multiple prediction techniques.
Unicode-XML-TEI text/corpus analysis platform
TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
Lexico-grammatical/syntactical analyzer of sentences in English
"SayWhat?" is a natural language processing system for spell-checking, lexical, grammatical and syntactical analysis of simple sentences in English. Note that it is just an educational project, not even 50%-complete (there are still many problems to solve), so don't expect too much.
External plugins for modnlp/teccli
This is a general project for modnlp/teccli plugins, with focus on text visualizaton.
Corpus Linguistics Software
Some software for Corpus Linguistics, which includes Corpus Text Editor, Web-based search, etc. This project created for Belarusian Corpus, but can be used for other languages with some adaption.
Natural Language Processing (NLP) for the Masses
Semantic Assistants support users in content retrieval, analysis, and development, by offering context-sensitive NLP services directly integrated in standard desktop clients, like a word processor, and web information systems, like a wiki.
A compiler to improve relation management between mobile users. This compiler will handle data islands for data transportation between a client mobile phone and a server node accesing a cellular network.
Linking Language to Knowledge with Distributional Semantics
JobimText is a software solution for automatic text expansion using contextualized distributional similarity. It provides text analysis tools for large corpora and has capabilities to create distributional semantic models (JoBimText models) and multi-word expressions.
General Intelligence Algorithm
Free translating dictionaries. Source format: TEI-P5 XML. Delivery formats: DICT, Stardict, etc. The dictionaries may include information on the pronunciation, etymology and such, in a platform-independent format. Access: web/plugins/standalone.
A simple tool for typing characters in different writing systems.
PangInput is a simple application to help you in typing characters from different languages in unicode. Three methods are available: 1) a virtual keyboard, mapping specific characters to each key on your keyboard; 2) custom character sets, which you can select by clicking on them; 3) macro sets, allowing input of complex scripts - basically mapping a latin transcription to the actual writing of characters or words.
Arabic voice files for eSpeak system
Arabic files and voices for eSpeak Text to speech system, المنطيق : ملفات اللغة العربية لبرنامج توليد الكلام من النص إسبيك
A text management tool for linguistic purposes...
Parsing Korean words by morpheme and part-of-speech
RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.
An open source system for Arabic corpora processing
Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format
Arabic Text Vocalization system
Automatic system of vocalization of arabic text.