Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.
Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
HanNanum is a Korean Morphological Analyzer and POS Tagger. A plug-in component-based architecture is adapted to the new Java version for flexible use. You can find the work flow for morphological analysis, POS tagging, noun extraction, etc.
Contact:
kschoi@kaist.ac.kr
hjjeong@world.kaist.ac.kr
A system to perform analysis of large documents for the purpose of cataloging similar documents. Similarity is based upon contextual analysis of these documents done by identifying common words and proper nouns.
jWords is a port of WORDS (by William Whitaker, a free latin-to-english dictionary program written in Ada), to Java. Besides the dictionary will be translated to the German language.
WQuery is a domain-specific query language designed to process WordNet-like lexical databases. It may be used as a standalone application or as an API to a lexical database in Java based systems.
The program creates OWL ontology files that describe relationships between entities. Basis are definitions found by searching Wikipedia articles for specific lexico-syntactic patterns.
ELIA(Eyegaze Language Integration Analysis) supports the analysis of eye-tracking data for studies in language processing. ELIA eases early analysis of data to enable iterative development of experiments in response to spoken language.
CORPSE (CORPus SEarch) is a powerful search engine written in Java. The aim is to provide an efficient implementation of a word level inverted index search with various cool functions that can be used on very large corpora.
A linguistic tool to aid in the study of Linguistics/Phonology, specifically distinctive features of possible language sounds. Comprised of both a Visual C++ .NET version as well as a Java based web applet version. The C++ version has all but been ab
Java program to create a (potentially multilingual) glossary of the unique words in any given Lojban text.
Note that the Sourceforge page for this was superceded by the Bitbucket repository: https://bitbucket.org/pretoriusjf/vlastezba/overview
Any further updates will be made there.
Editor for formal grammars. Attempts to be universal – customizable for any grammatical formalism and any syntax. Provides features such as syntax checking and highlighting, transformations (refactoring) and advanced rule editor.
OO Pinyin Guide is a Java extension for OpenOffice 3 or higher. It enables the user to add pinyin transliteration over Chinese characters inside a text document. This tool can be useful for people learning or teaching Chinese.
A lyrical analysis and classification tool focused specifically on rhyming style in rap lyrics. Functions include phonetic transcription, rhyme visualization, and rapper classification.
stocleka is a project divided into a UI and a library for cleaning user stories and converting them to arff files (used for Weka). it may be mainly used for research and scientific purposes.
Connecting Historical Authorities with Links, Contexts and Entities. CHALICE is a historic placename gazetteer for the UK, published as Linked Data and linked to other widely-used sources of placename reference information on the semantic web.
Wikipedia Concept Association Map (WCAM) is new approach for textual knowledge representation and understanding. All concepts and associations are stored in a graph database for better performance and easy distribution.
It's a utility application for updating and integrating translation memories, created by the Autshumato ITE, over a network. Licensed under the TMate Open Source License and free to download and be used by anyone.
Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.
This project is a compilation of tools/libraries to help with tasks related to Text Analytics mainly in Java. These tools range from simple wrappers to sophisticated mining tasks that can improve the productivity of researchers and engineers.