The library does quick search of dictionary words in arbitrary input strings.
Known problems are known. :) See appropriate section in documentation. Also, only ASCII words and strings for now.
This is a Java-based project for complex event extraction from text and co-reference resolution. Currently the code can read BioNLP shared task format (http://2011.bionlp-st.org/) and i2b2 Natural Language Processing for Clinical Data shared task format (https://www.i2b2.org/NLP/DataSets/Main.php). Event extraction includes finding events and the parameters for an event in a text.
The method is based on SVM but other ML algorithms can be adopted. The method details are explained in the...
A self-contained, fully configurable Java "game" to simulate multi-species evolution. Design species by optionally specifying every attribute; modify any or all environmental settings; let them loose to eat, fight, procreate, die, and Evolve!
FM-index is a compressed text index appeared in 2000 (http://pizzachili.dcc.uchile.cl/indexes/FM-indexV2/). A recent algorithm allows to update a FM-index (http://dx.doi.org/10.1016/j.jda.2009.02.007). Here, you'll find the implementation.
LDIFF is an enhanced language-independent line differencing tool built upon the Unix diff and overcomes its limitations in determining whether an artifact line has been changed or is the result of additions and removals
TimeFinder automatically optimizes schedules (timetables) for universities and high schools. It makes manual timetabling for the timetabler easier via a Java GUI. Export+import is supported via xml and text formats. http://timefinder.sourceforge.net/
Java classes that enable definition of new Charsets based on other existing Charsets, without additional programming. Includes a character set with Kamenik encoding.
A collaboration platform that enables non-locking, synchronous, real-time collaborative (NOT text only) editing with editor independence. It also provides edit by edit session playback. To collaborate just enter a name, group and password.
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
LAITOR is a text mining software developed to find co-occurrence of biological entities (gene/protein terms) together with biointeractions and concepts term from customized dictionaries.
AVL Array is a sequence container (like std::vector or std::list) that allows fast insert/remove AND fast random access. Shiftable Files offers the usual file primitives plus fast insert/remove. Get the latest version via BZR in the Develop section.
GEECoP (Graphical Editor and Engine for Constrained Processes) is a tool to graphically design control-flow graphs and their associated constraints. It supports creation of Concurrent Transaction Logic formulas and validation of constrained processes.
A Java package for pretty-printing a text by deciding where to introduce line-breaks and indentation. A Java implementation of Derek Oppen\'s pretty printing algorithm. It is _not_ a pretty printer for Java code, though it could be used to write one.
The Lingual Quanta is an organization created by software engineers that are interested in Natural Language Processing technologies focused in libraries useful for projects such as grammar checkers, text markups etc.
Provides: A tool collection for array orientated programming under Linux and Unix operating systems.
Main subjects: Handling text- and numeric arrays, mainly stored in shell variables.
Furtheron, some useful arithmetic solutions will be provided.
The Vodoo/Stream project let users to define transducers dedicated to document analysis. Such transducers describe how fragments are matched and transformed. Finally a document can be an XML fragment, a free text or something else depending on extensions
LiMa means Lightweight Markup Language. It is a parser for an easy to use ASCII/Text-based markup - comparable to Markdown or the Wikipedia-Markup language with special configurable extensions in defining Links and image-resources.
Command line encryption tool for one time, daemon, or stream data processing. Data stats, check sums, conversion to/from text. Data/keys from files, pipes, standard input. In-place/diverted processing or data-analysis-only. Random, file, password keys.
Sounder is a spell checker that allows user to enter a word as they think it should sound not as they think it is spelled. A list of similar sounding words is returned to the user with the correct spelling.
The project is a collection of object pascal libraries for parsing text strings and markup languages HTML, XHTML, XML, CSS and others. Libraries are written in minimalism concept with striving for wide unification.
A fast way to rate the reading challenging level of book or text. Unlike well known reading metrics such as Fog, Kincaid, SMOG, ARI, Flesch, and Coleman-Liau readability this metric takes into account far more factors and is standarized against a corpus
This project intends to create an indexing search engine, for knowledge management. The primary object is to apply an information retrieval core. And implement a knowledge data discovery theory such as data mining algorithm, text mining.
Blubber system is an Eclipse RCP application for distributed systems modeling. Designed for university researches and inlcudes: task graph editor; graph transformation tools; graph modelling on distributed system; real-time modelling on GerdaFramework.