VecText is an application that converts raw text to a structured format suitable for various data mining software. The application is written in interpreted programming language Perl. A part of the functionality is realized by external modules (e.g., Lingua::Stem::Snowball for stemming). The graphical user interface enables user-friendly software employment without requiring specialized technical skills and knowledge of a particular programming language, names of libraries and their...
Safe Harbor Deidentification for medical documents
Phalanx - Deidentify
Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.
We describe a simple XML format to share text documents and annotation
A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented.
Project files contain:
- simple code to hold/read/write data and perform sample processing.
- BioC-formatted corpora
- BioC tools that work with BioC corpora
BioC goals
- simplicity
- interoperability
- broad use
- reuse
There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.
MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.
Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
InproTK is an Incremental Spoken Dialogue Processing Toolkit, that is, a toolkit to help you build dialogue systems that listen and talk incrementally, allowing for advanced interactional behaviour.
Please see our Wiki for more information: http://sourceforge.net/p/inprotk/wiki/
This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.
This is a Java-based project for complex event extraction from text and co-reference resolution. Currently the code can read BioNLP shared task format (http://2011.bionlp-st.org/) and i2b2 Natural Language Processing for Clinical Data shared task format (https://www.i2b2.org/NLP/DataSets/Main.php). Event extraction includes finding events and the parameters for an event in a text.
The method is based on SVM but other ML algorithms can be adopted. The method details are explained in the following paper:
Ehsan Emadzadeh, Azadeh Nikfarjam, and Graciela Gonzalez. 2011. ...
D.U.C.K (Determine segmentation of Unknown words by using Context Knowledge)is an NLP tool, which aims to find the correct segmentation for unknown words in written Hebrew. Statistics from different scopes will be used to determine the segmentation.
Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.
Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
MutationFinder is a biomedical natural language processing (NLP) system for extracting mentions of point mutations from free text. MutationFinder achieves high performance (99% precision, 81% recall on blind test data) as an information extraction system
NOTE: I couldn't keep up this project to align with latest Unicode spec. Not sure I may be continuing. You can try Myanmar3 from Myanmar NLP or WinUniInnwa or https://sourceforge.net/projects/prahita/ or something better compliant font. ~Victor
---
[This is UniBurma - UniMM project workshop area. This project currently have two productions, UniBurma and UniMM. For more descriptive info about this project, please visit http://unimm.org/. You can browse lastest source from SVN trunk.]