Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.
Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Safe Harbor Deidentification for medical documents
Phalanx - Deidentify
Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.
************************************************************
THIS PROJECT IS MOVED.
See http://khcoder.net/en for the latest & greatest.
You can download this tool from the new home.
See you there!
************************************************************
Encode Arabic provides tools for encoding and decoding Arabic in Haskell, Python, Perl, or LaTeX. Interprets the ArabTeX notation to generate original orthography or phonetic transcription. Supports Buckwalter and other romanizations. Converts legacy byte encodings into Unicode.
http://github.com/otakar-smrz/encode-arabic
ElixirFM is a high-level implementation of Functional Arabic Morphology. The core of ElixirFM is written in Haskell, while interfaces in Python and Perl support lexicon editing and other interactions.
http://github.com/otakar-smrz/elixir-fm
Part-of-speech tagging is the task of assigning symbols from a particular set to words in a natural language text. ACOPOST implements and extends well-known machine learning techniques and provides a uniform environment for testing.
This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
Perstem is a Persian (Farsi) stemmer, morphological analyzer, transliterator, and partial part-of-speech tagger. Inflexional morphemes are separated or removed from their stems. Perstem can also tokenize and transliterate between various character set encodings and romanizations.
Various tools for creating annotated parallel corpora including pre-trained tagging and parsing models for various languages, sentence alignment tools and word alignment tools.
Uplug also includes a web-based interface for interactive sentence and word alignment and scripts for indexing and querying parallel corpora using the Corpus Work Bench CWB.
Download 'uplug-main' first and then add other packages.
ValiTerms is a tool that helps the validation of terms in corpus. It finds their occurrences and allows terminologists to choose if a term is relevant or not. ValiTerms is developed at LIPN (http://www-lipn.univ-paris13.fr), RCLN team.
Please consult the wiki for instructions about installation and usage.
This is a Java-based project for complex event extraction from text and co-reference resolution. Currently the code can read BioNLP shared task format (http://2011.bionlp-st.org/) and i2b2 Natural Language Processing for Clinical Data shared task format (https://www.i2b2.org/NLP/DataSets/Main.php). Event extraction includes finding events and the parameters for an event in a text.
The method is based on SVM but other ML algorithms can be adopted. The method details are explained in the...
The Parenthesis Classifier takes the contents of a set of parentheses and classifies it into one of several categories. It includes a parenthesized-data extractor and the classifier.
The Simple Semantic Classifier classifies short chunks of natural language text into broad semantic classes that correspond to the OBO ontologies provided as input.
This program reads each of 270,000 entries of the BÍN database of ICELANDIC lemmata and all their forms. It assigns one of hundreds of morphological paradigms to each entry. It won a special award in the Þú átt orðið competition (www.ordid.is)
Varamozhi is a free English-Malayalam transliteration library. It can transliterate Malayalam text between Malayalam and English scripts. Varamozhi takes as the input, the mapping between a Malayalam font and a transliteration scheme; outputs functions i
Based on the Buckwalter Morphological Analyzer (Version 1.0) for doing Arabic stemming and POS tagging. Includes a rewrite of the original Perl script, with better documentation and more flexible options, and a C++ interface (usable as a library or app).
Stance is a perl script for generating random sentences in Dutch, which can be used as translation exercises for students of Dutch. In its finished version, it should be able to generate only gramatically correct sentences.