This project is a compilation of tools/libraries to help with tasks related to Text Analytics mainly in Java. These tools range from simple wrappers to sophisticated mining tasks that can improve the productivity of researchers and engineers.
Maui is a multi-purpose automatic topic indexing algorithm. Given a document, Maui automatically identifies its topics. Depending on the task topics are tags, keywords, keyphrases, vocabulary terms, descriptors or Wikipedia titles.
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
The Neurpheus Morphological Analyser performs morphological analysis, stemming or word form generation tasks using sophisticated classification methods for an analysis of words unseen in a training dictionary.
DawNLITE is a Natural-Language-based Image Transmoding Engine. The software transforms an image to a video as recorded by a virtual camera panning and zooming over the image, following a natural language text description of the image.
ConTextKit is a Java-based implementation of Wendy Chapman's ConText algorithm for annotating the context of medical documents, specifically the negation, temporality, and experiencer.
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Tipa2Unicode is a little stand-alone application which lets you input Tipa-shortcuts and then displays Unicode symbols. You can then paste either side of the conversion to LaTeX or a word processor.
A compiler to improve relation management between mobile users. This compiler will handle data islands for data transportation between a client mobile phone and a server node accesing a cellular network.
Want to count the number of syllables in a word? Want to create a random haiku? This Java application can do both. Just add the JAR file to your project for access to these basic classes. We use CMU's pronunciation dictionary to count each syllable.
This project aimed at creating framework and binary data format for etymological Arabic system. and will not continue hosted at sourceforge because the term of use determine me as enemy, so I am prohibited from using sourceforge services.
Wordcorr automates the tedious and risky process of tabulating and managing the sound correspondences used in working out the historical development of natural languages. Initial support was from NSF.
Nasira is a Java library for reading text files with non-ASCII characters (e.g. documents in German, Swedish,...). To do so, it automatically determines the character encoding (iso-8859-1, utf-8) used to encode the file through user-provided hints.
The Stemmer class transforms a word into its root form. The input word is provided from the add() methods. The stem() method will return the stem as will toString() after stem() has been
called). The clear() method will wipe the Stemmer buffer and allow a new word to be input.
This version extends Martin Porter's original stemming algorithm by allowing capital letters to exist in words. This version should also be plugged in wherever the old algorithm is used with
few...
Making Hebrew properly searchable by IR software. Right now, most work is being done in our mailing list (planning), and on our github repository (concept code, see below).
Supertagging is a process of statistical lexical disambiguation, preprocessing step to parsing, which assigns LTAG tree categories to the lexical items present in the input sentence. Thus, if the input sentence is in the form of a dependency tree, the task of the supertagger is to assign the most probable TAG family to each node and edge in the dependency tree.
Privacy Rule Definition Language to write Enterprise Privacy Policies
PRDL is one of the core components within the ENDORSE project. The scope of the language is to encompass clauses from data protection legislation and enterprise privacy policies in order to e.g. derive data access decisions automatically based on the enterprise privacy policies (EPPs). There have been many initiatives for expressing privacy rules and legal restrictions into a computable way. The attempt of PRDL is to present a collaborative result towards a multistakeholder language. The...