FORce based Cluster Editing (FORCE) is a Java software heuristically solving the graph cluster editing problem on weighted edges using BLAST E-values. It further provides a training mode for heuristic parameter estimation.
JVnSegmenter is a Java-based and open-source Vietnamese word segmentation tool. The segmentation model was trained on about 8,000 sentences using Conditional Random Fields (FlexCRFs). This tool would be useful for Vietnamese NLP community.
A user-friendly open-source toolkit written in Java that lets you visualize and analyze the behaviour of users in the ActiveWorlds family of 3D virtual worlds by mapping them over 2D space.
Evidence-based Guideline and Decision Support System. Provides patient specific point of care reminders in order to aid physicians provide high quality care. Input/output in the form of HL7 CDA Level 2 documents. Knowledge is encoded using Arden Syntax.
K-automaton is a new parsing (syntactic analysis) machine isomorphous to language. Implemented in Java. Can generate Java code from grammars described in EBNF.
This project consists in a simulation software of robot A.I. It is aimed at comparing the efficiency of robot intelligence against movement tasks between fixed checkpoints in a logical world.
The Internet Soccer Database aims to build a database structure that can contain all fixtures/results/statistics/odds information for any soccer league/competition. Once the structure is defined the data will be populated and made available for analysis
Set of Ant filters that can be used to gather statistics from files or resources. It is mainly used for log files analysis. It allows to: - count inputs - count occurrences of each input - calculate average, max and min values of floats in input
Bitnets instantiates and operates on graphs and subgraphs of large complex networks, such as kinship networks. Bitnets consists mainly of a java library, a number of use examples and an interactive interpreted language interface.
OBOES (Open Biomedical Ontology-Based Enrichment and Search) is an information-theory-based platform that embeds new integrative methods allowing biologists to evaluate new hypotheses.
Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
Qualiweb aims at providing semantic web metrics for modeling a website visitors needs according to a given taxonomy or document classification. Web metrics provided by Qualiweb give an indication of how successful each of the website topics have been.
A web-based repository for UIMA-compliant information analysis components, with a web-based interface for humans and a plugin interface for IDEs. More information is available at this project's website.
W.H.A.T. is an analytic tool for Wikipedia with two main functionalities: an article network and extensive statistics. It contains a visualization of the article networks and a powerful interface to analyze the behavior of authors.
JTextPro: A Java-based Text Processing tool that includes sentence boundary detection (using maximum entropy classifier), word tokenization (following Penn conventions), part-of-speech tagging (using CRFTagger), and phrase chunking (using CRFChunker).
RunCC is a new kind of parsergenerator that generates parsers and lexers at runtime. Source generation is only optional. It features the absence of any cryptography. Although intended for small languages, it comes with Java and XML example parsers.
Azureus Plug-In that allocates the ip adresses of the peers to the country and the city they belong to and visualize that data on a world map or in statistics. This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com/.
BabyTALK is to add another brick in the wall of natural languages learning. The baby needs to structure a corpus of texts when his tutor points and talks about a particular part of the corpus. The baby is also to describe any selected part of the corpus.
The Text Annotation Environment (tae) can be used to annotate natural language text manually or automatically (UIMA Annotator) with meta information (tokens, part-of-speech, named entities, ...). Tae is based on Eclipse and IBM's UIMA.
CRFChunker: Conditional Random Fields Phrase Chunker (Phrase Chunking Tool) for English. The model was trained on sections 01..24 of WSJ corpus and using section 00 as the development test set (F1-score of 95.77). Chunking speed: 700 sentences/s
CRFTagger: Conditional Random Fields Part-of-Speech (POS) Tagger for English. The model was trained on sections 01..24 of WSJ corpus and using section 00 as the development test set (accuracy of 97.00%). Tagging speed: 500 sentences/s.