The Fiber project seeks to create a modular open source text mining tool that provides a contextual foundation for analysis in the dissemination of large quantities of text data.
NeuroEvolver is a tool for training and experimenting with one particular kind of recurrent neural networks, namely Long Short-Term Memory networks using an evolutionary learning method.
The main purpose of AMATOOL is to create an application for semiautomatic mark of text, using XML tags. The texts is typical can be archaeological reports or midleagetextscripts.
It is a semiautomtaic editor.
KNeTS (Knowledge Elicitation Tools) is a survey tool to create multi-agent models based on local knowledge using pattern analysis to identify rules that are iteratively validated with the informant. The final output is a knowledge-based multi-agent model
Similarity Evaluator is a tool to analyse similarity function implementations and algorithms, where is possible to compare several APIs on performance, best result, similarity and discernability values.
Visualization of finite state machines as a network graph. Accepted input files at the moment are: net files exported from xfst (Xerox Finite-State Tool) and lexc files (Finite-State Lexicon Compiler).
JWebPro: A Java tool that can interact with Google search and then process the returned Web documents in a couple of ways. The outputs can serve as inputs for NLP, IR, infor extraction, Web mining, online social network extraction/analysis applications.
The Word Vector Tool is a simple but flexible Java library to create word vector representations of text documents. Word vectors can be used for various text processing tasks, as text classification, text clustering or information retrieval.
Market Advisor is a project which offers a tool to achieve a better performance for your investments on stock quote market. It's focused on Italian Stock Market but it can be also easily extended to other markets.
JVnSegmenter is a Java-based and open-source Vietnamese word segmentation tool. The segmentation model was trained on about 8,000 sentences using Conditional Random Fields (FlexCRFs). This tool would be useful for Vietnamese NLP community.
W.H.A.T. is an analytic tool for Wikipedia with two main functionalities: an article network and extensive statistics. It contains a visualization of the article networks and a powerful interface to analyze the behavior of authors.
vyasa is a digital library application that incorporates the functions of digital asset and document management systems. It facilitates information retrieval and knowledge discovery by providing comprehensive metadata generation and semantic analysis.
JTextPro: A Java-based Text Processing tool that includes sentence boundary detection (using maximum entropy classifier), word tokenization (following Penn conventions), part-of-speech tagging (using CRFTagger), and phrase chunking (using CRFChunker).
CRFChunker: Conditional Random Fields Phrase Chunker (Phrase Chunking Tool) for English. The model was trained on sections 01..24 of WSJ corpus and using section 00 as the development test set (F1-score of 95.77). Chunking speed: 700 sentences/s
SurveyForge is a survey definition and execution tool oriented to statisticians running on JEE platform with special emphasis on data entry made easy, use of existing standards (Triple-S, Metanet, DDI) and reuse of standard (or not) classifications
AmiGram is the AMI Graphical Representation and Annotation Module. It is a general-purpose tool for multimodal corpus annotation and allows the time line based annoation of NXT corpora in a layer based environment.
WikiVis is a tool to analyze Wikipedia based on several aspects. The main objective is to visualize the conclusions of this examination, which focusses on the editing frequency and relevance of articles and categories as well as the activity of users.
SENTENSA Knowledge Miner is a platform independent tool for searching any text. SENTENSA uses robust methods of indexing and searching text, leveraging on experience from more than 20 years of information retrieval.
The Information Visualization Cyberinfrastructure is a graphical tool with diverse modeling, analysis, and visualization algorithms for education and research. This tool is built on CiShell: Cyberinfrastructure Shell.
JudoScript is a general-purpose, Java scripting, multi-domain scripting tool/language. It combines the powers of declarative scripting for many modern tasks and general object/procedural programming. It is simple, intuitive, practical and powerful.
MicroArray Genome Imaging and Clustering Tool (MAGIC tool) is a platform-independant java program for analyzing MicroArray data (.tiff scans & .txt godlists) via graphs and clustering operations (including QT-clustering). http://www.bio.davidson.edu/magic
CloneAnalyzer is a tool for software quality analysis. It allows you to find, display and inspect clones, which are fragments of duplicated source code resulting from lack of proper reuse. It can be used as a plugin for Eclipse and on the command line.
A mechanism for identifying, acquiring, transforming, and serving data from multiple geographically diverse, heterogeneous data sets from a common data portal. This work is funded by a NASA Advanced Information Systems Technology (AIST) grant.