One platform to build, fine-tune, and deploy ML models. No MLOps team required.
Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Try Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
NOTE THAT THE SOURCE CODE AND ISSUE TRACKER HAVE NOW MOVED TO GITHUB. FIND US AT https://github.com/GateNLP/
GATE (General Architecture for Text Engineering) is an architecture, framework and development environment for developing, evaluating and embedding Human Language Technology. See http://gate.ac.uk for full details.
The Java Data Mining Package (JDMP) is a library that provides methods for analyzing data with the help of machine learning algorithms (e.g. clustering, classification, graphical models, neural networks, Bayesian networks, textprocessing, optimization).
An application developed in C using the list and the AVL tree data structures, which analyzes a text (.txt file) giving the following information as an output:
1. the total occurrences of every word in the text
2. the exact line of every occurrence of every word
3. the exact position in the line of every occurrence of every word
4. the exact paragraph of every occurrence of every word
5. the exact sentence of every occurrence of every word
The output is also written in a...
The BioNLP UIMA Component Repository provides UIMA wrappers for novel and well-known 3rd-party NLP tools used in biomedical text prosessing, such as tokenizers, parsers, named entity taggers, and tools for evaluation.
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
-----------------
- What is it? -
-----------------
The "Falcon Search" is a JAVA API and tool to search inside the
documents. It was originally started to search the content in pdf files
under the project "HAWK Search".
Searching with this tool is query-based not word-based as in most
of the document search tools OR document readers. It also takes care
of jumbling of words within query and spelling mistakes.
Commonly used techniques in this project are Natural Language...
ASTL Automata Standard Template Library (Vincent Le Maout - Dominique Revuz) is a set of generic and efficient C++ components for automata manipulation.
A set of Unix command line tools for quick and convenient batch processing of tabular text files (a.k.a., tab-delimited, csv, or flat file format) with a header line. Provides delimiter and compression detection, column reference by name.
* tblmap: per-line ("map") computation: derive columns through an expression, delete, reorder, filter rows.
* tblred: compute ("reduce") aggregations (e.g., sum, average) over groups defined by key columns
Apolda is a plugin for the Gate framework (see http://sourceforge.net/projects/gate/) that annotates texts with labels of concepts from an arbitrary OWL-ontology.
TextMarker is now developed and hosted at Apache UIMA (http://uima.apache.org/textmarker.html). TextMarker is a UIMA-based tool for information extraction and more. The full featured editor of the rule language and the build process of UIMA descriptors are complemented with components for visualization, explanation, testing and rule learning.
OCR c++ library. Include: contour recognition; vectorisation; matrix letter feature recognition; auto page segmentation and detect rotation; SS3 ASM core; XML base; web-based GUI; 99,6% printed Unicode text recognition; letter base up to 1200 letters.
We are using a large archive of newspaper stories(GigaWordCorpus) as input to a parallel MPI program, and produce from that a list of top R terms of varying lengths M through N that are especially interesting.
The program is done in C using MPI.
n-squared is a light weight, super powered note pad application that stores notes in an embedded database for easy searching. It has a tabbed interface, syntax highlighting, encryption, and more!
The Java Text Categorizing Library (JTCL) is a pure java implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy."
hypKNOWsys aims at developing a Java-based workbench for knowledge discovery and knowledge management. Currently, hypKNOWsys has released two intermediate tools: DIAsDEM Workbench (text mining for semantic tagging) and WUMprep (Web mining pre-processing)
Java Expert Rule Based Inference Language. Jerbil is an open source rule processing engine written in Java. Currently Jerbil supports a full set of processing functions with text-based and XML interfaces; a Java interface is planned.
Flesh is a Java application designed to analyze a document (plain text, rich text, Word documents, and PDFs) and display the difficulty associated with comprehending using the Flesch-Kincaid Grade Level and the Flesch Reading Ease Score.
JTextPro: A Java-based TextProcessing tool that includes sentence boundary detection (using maximum entropy classifier), word tokenization (following Penn conventions), part-of-speech tagging (using CRFTagger), and phrase chunking (using CRFChunker).
AutoSummary uses Natural Language Processing to generate a contextually-relevant synopsis of plain text. It uses statistical and rule-based methods for part-of-speech tagging, word sense disambiguation, sentence deconstruction and semantic analysis.
The "Universal Content Evaluation and Categorisation Software" is a program for analysing a websites, or more generally, a texts content. The text is arranged in dozens of categories, permitting more efficient web searches and information processing.
a cross-platform application to decode, search, browse, view, print, and export TLG/PHI BetaCode texts. Project is currently being ported from wxWindows to Java. (For more info, see the project homepage at http://wxtlg.sourceforge.net)