The project includes a software, which uses an AI for autonomous sailing of a small sailing boat, equipped with various sensors and GPS. There are various methods for evaluation and improvement of sensor data, route planning, collision avoidance.
Knime (http://www.knime.org) nodes for sequence bioinformatics. Sequime is an eclipse plug-in for the KNIME data mining platform, providing additional nodes for reading, processing and visualizing sequence information.
Parsers for biological data based on scanner generators like Flex (C), Re2c(C), Jflex (Java) and Ifickle (Tcl). This scanner generators are providing easier maintainance, development and higher speed than hand written scanners. Scanner output is SQL.
Enrich and query corpora in the TEI-XML vocabulary. CorpusReader manage very large corpora and corpora containing milestone annotation. It provides tools for enriching corpora with output of linguistic parsers, and for extracting quantitative information
The system searches synonyms (and related words) in Wikipedia. WikIDF generates index database of Wikipedia (for Russian, English, and German). The continuation of this project is "wikokit" at code.google.com
Cougar Squared is a new Java library for machine learning and data mining research, supporting research needs of the community. It is written by researchers for researchers. It extends the WEKA and YALE machine learning frameworks.
Data mining tool for sequences (e.g. trajectories on a map, visited web pages, etc.) that creates a succinct description of the sequences, given a taxonomy (e.g. regions and sub-regions in the map, categories and sub-categories of pages, etc.).
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Siafu simulates individual agents and their context, from home to city-wide scenarios. As a developer, you use the API to write your simulation for the purposes of data-set generation, test or visualization, optionally hooking it to your own application.
Regexp testing tool allows to apply group of regexps to huge arrays of data (millions or so) in order to investigate search or search/replacement possibilities of regexp group.
library for capturing, storing and visualizing timeseries data
The JTimeSeries has moved to github
Please go to https://github.com/JTimeSeries/jtimeseries
The SourceForge copy has not been maintained since Sep 2012
A java library to assist with capturing and storing timeseries data/metrics. Provides facilities to publish timeseries data across a network, a lightweight server to persist series data, and client user interface components for real time visualization
JGraph is the most powerful, lightweight, feature-rich, and thoroughly documented open-source graph component available for Java. See the project homepage at www.jgraph.com for information and downloads.
Ontea - Pattern based Semantic Annotation Platform. Ontea search or create semantic meta data from text or documents using pattern based approaches. Implementation currently includes regular expressions (regex) patterns
Contextor is a light-weight simple-to-use Java based library to help developers and researchers working with the general concept of a resource; as examples, resources can be text resources, web resources, images and videos.
Annotate data sets that are stored in a Tranche repository. Administrators can maintain multiple versions of annotation standards - structured requested information. Supports controlled vocabularies.
Open data mining platform. Provides common architecture for algorithms of various types. Efficient processing of arbitrarily large volumes of data thanks to data streaming. Weka and Rseslib partially integrated. (www.debellor.org)
Example-based Modeling (EMO) is an tool to create data models, with examples, using a web interface. You interactively create a web-accessible database of models and samples for those models. A white paper describes the underlying assumptions.
ClimateTrends is an information and analysis tool on global climate change developed by the World Resources Institute. ClimateTrends provides a comprehensive and comparable database of greenhouse gas emissions data.
Cyberinfrastructure Shell (CIShell) is an open source, community-driven framework/application for the integration and utilization of datasets, algorithms, tools, and computing resources. Algorithms can be integrated using most programming languages.
ViSBARD (Visual System for Browsing, Analysis, and Retrieval of Data) is an interactive visualization and analysis tool for space physics data. It provides an integrated 3-D/2-D environment to analyze measurements across many spacecraft and MHD models.
RILA is a machine learning software for relational data. It is able to find frequent patterns in a set of connected tables stored in relational database management systems.