Cougar Squared is a new Java library for machine learning and data mining research, supporting research needs of the community. It is written by researchers for researchers. It extends the WEKA and YALE machine learning frameworks.
Spy4j is a framework implemented in java which will provide APIs to support Java developers to perform data mining operations on the end user's behavior data using various collective intelligence algorithms.
Data mining tool for sequences (e.g. trajectories on a map, visited web pages, etc.) that creates a succinct description of the sequences, given a taxonomy (e.g. regions and sub-regions in the map, categories and sub-categories of pages, etc.).
"poco" (Spanish & Italian for "little") OLAP provides a web-based, crosstab reporting tool for your datawarehouse. While it's not an OLAP server or full fledged data mining solution, pocOLAP makes your data easy to use and understand ... for free!
KeplerWeka adds the functionality of the open-source machine learning and data mining workbench WEKA to the free and open-source, scientific workflow application, Kepler.
Contextor is a light-weight simple-to-use Java based library to help developers and researchers working with the general concept of a resource; as examples, resources can be text resources, web resources, images and videos.
This library offers an API to useful tools that can be utilized for any text mining project. More details will be mentioned in the project's future documentation
Wikipedia Concept Association Map (WCAM) is new approach for textual knowledge representation and understanding. All concepts and associations are stored in a graph database for better performance and easy distribution.
Open data mining platform. Provides common architecture for algorithms of various types. Efficient processing of arbitrarily large volumes of data thanks to data streaming. Weka and Rseslib partially integrated. (www.debellor.org)
SPIDR (Space Physics Interactive Data Resource) is a distributed database and application server network, built to select, visualize and model historical space weather data. SPIDR is a web-application and a grid of data mining web-services.
Moara is a biological text mining tool and consists of a Java library and some auxiliary MySQL databases for gene/protein training and extraction of mentions and its further normalization and disambiguation.
WSRF-compliant tools and services for data mining in grid computing environments, based on: Globus Toolkit 4, Condor and Triana workflow system. Learn more at: http://www.datamininggrid.org Copyright (c) 2008 DataMiningGrid Consortium.
A group a subprojects for Data Cleaning projects, mainly as a step of a Data Mining Project. Visit www.datacleaningopensource.com to review our current applications or if you want to add yours. NOTE: PROGRAMMING SKILLS ARE REQUIRED.
A java tool for anytime and interactive sequence mining. Aims at providing users with a way of analyzing her activity traces and extract activity schemes from them.
The Minervan project aims at aiding intelligent software development. It integrates reporting, analysis and data mining to support better decision making.
This project is a compilation of tools/libraries to help with tasks related to Text Analytics mainly in Java. These tools range from simple wrappers to sophisticated mining tasks that can improve the productivity of researchers and engineers.
OpenDMAP (Open Source Direct Memory Access Parser) is a natural language processing (text mining) application: a semantic parser for information extraction.
Webstats Solr is an attempt to make Apache Access log easier to Data Mine. By adding a powerful Search Engine (SOLR) as a Backend and using Java Script and HTML and maybe PHP I hope to out date AWStats.