Machine learning software to solve data mining problems
Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.
Fast C++ library for linear algebra (matrix maths) and scientific computing. Easy to use functions and syntax, deliberately similar to Matlab. Uses template meta-programming techniques. Also provides efficient wrappers for LAPACK, BLAS, ATLAS, ARPACK and SuperLU libraries, including high-performance versions such as OpenBLAS and Intel MKL. Useful for machine learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. For more details, see http://arma.sourceforge.net
Dlib is a C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems.
Java Neural Network Framework
Neuroph is lightweight Java Neural Network Framework which can be used to develop common neural network architectures. Small number of basic classes which correspond to basic NN concepts, and GUI editor makes it easy to learn and use.
A Multi-label Extension to Weka
Multi-label classifiers and evaluation procedures using the Weka machine learning framework.
Machine learning algorithms for advanced analytics
OpenNN is a software library written in C++ for advanced analytics. It implements neural networks, the most successful machine learning method. Some typical applications of OpenNN are business intelligence (customer segmentation, churn prevention…), health care (early diagnosis, microarray analysis…) and engineering (performance optimization, predictive maitenance…). OpenNN does not deal with computer vision or natural language processing. The main advantage of OpenNN is its high performance. This library outstands in terms of execution speed and memory allocation. It is constantly optimized and parallelized in order to maximize its efficiency. The documentation is composed by tutorials and examples to offer a complete overview about the library. OpenNN is developed by Artelnics, a company specialized in artificial intelligence.
Open source software for training neural networks
Multiple Back-Propagation is an open source software application for training neural networks with the backpropagation and the multiple back propagation algorithms. Currently this project is also hosted at http://code.google.com/p/multiplebackpropagation
Speech recognition software for English & Polish languages
Software for speech recognition in English & Polish languages. Basic versions of SkryBot: 1. SkryBot Home Speech (English Language) - https://sourceforge.net/projects/skrybotdomowy/files/ReleasesEnglish/InstalatorSkryBotHomeSpeechDemo-22.214.171.12492.exe/download 2. SkryBot DoMowy (Polish Language) - https://sourceforge.net/projects/skrybotdomowy/files/Releases/InstalatorSkryBotDoMowyDemo-126.96.36.19982.exe/download More help: https://sourceforge.net/p/skrybotdomowy/wiki/ Domain advanced versions (Polish Language) 1. SkryBot Prawo - for judicial professionals. 2. SkryBot Administracyjny - for civil and government administration. 3. SkryBot Medycyna Rodzinna - for doctors, hospitals. Professional version of SkryBot (commercial) offers you: 1. Audio conversion and cutting sound files into smaller ones. 2. Searching for words or phrases in sound files (recognised by SkryBot). 3. Editing sound files and automatic cutting off long silence parts in the recording.
It's possible for machines to become self-aware.
We believe that it's possible for machines to become self-aware, but may not exhibit human-like thought processes. This project is a quest for conscious artificial intelligence. We will develop prototypes while we go for our main goal. Our steps will be 1) Develop a Learning/Predictive Module. 2) Develop a Planning Module based on the learning/predictive module. 3) Develop a Plan Optimization Module so plans built in the previous module can be optimized. 4) Develop a Decision Making Engine based on previous planning. 5) Develop prototypes of the artificial creature. 6) Publish some academic papers. And there is the video: http://www.youtube.com/watch?v=qH-IQgYy9zg Above video shows a popperian agent collecting mining ore from 3 mining sites and bringing to the base. At the time the agent is born, it doesn't know how to walk nor it knows that it feels pleasure by mining. He has tact only (blind agent). The video shows learning, planning, executing and optimizing plans.
Weka wrapper for the SGM toolkit for text classification and modeling.
Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. Other functions are usable through either Java command-line commands or class inclusion into Java projects.
Scientific computing, machine learning and computer vision for .NET
The Accord.NET Framework provides machine learning, mathematics, statistics, computer vision, computer audition, and several scientific computing related methods and techniques to .NET. The project is compatible with the .NET Framework. NET Standard, .NET Core, and Mono.
Converting text to a structured representation
VecText is an application that converts raw text to a structured format suitable for various data mining software. The application is written in interpreted programming language Perl. A part of the functionality is realized by external modules (e.g., Lingua::Stem::Snowball for stemming). The graphical user interface enables user-friendly software employment without requiring specialized technical skills and knowledge of a particular programming language, names of libraries and their functions, etc. All preprocessing actions are specified using common graphical elements organized into logically related blocks. The graphical user interface is implemented in Perl/Tk. In the command-line interface mode, all options need to be specified using the command line parameters. This way of non-interactive communication enables incorporating the application into a more complicated data mining process integrating several software packages or performing multiple conversions in a batch.
An open source optical flow algorithm framework for scientists and engineers alike.
Turku Event Extraction System
Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.
GPU Machine Learning Library. This library aims to provide machine learning researchers and practitioners with a high performance library by taking advantage of the GPU enormous computational power. The library is developed in C++ and CUDA.
Music research software
jMIR is an open-source software suite implemented in Java for use in music information retrieval (MIR) research. It can be used to study music in the form of audio recordings, symbolic encodings and lyrical transcriptions, and can also mine cultural information from the Internet. It also includes tools for managing and profiling large music collections and for checking audio for production errors. jMIR includes software for extracting features, applying machine learning algorithms, applying heuristic error error checkers, mining metadata and analyzing metadata.
GNAT recognizes gene names in text and maps them to NCBI Entrez Gene
GNAT is a BioNLP/text mining tool to recognize and identify gene/protein names in natural language text. It will detect mentions of genes in text, such as PubMed/Medline abstracts, and disambiguate them to remove false positives and map them to the correct entry in the NCBI Entrez Gene database by gene ID. March 2017: We started to upload GNAT output on Medline. See files/results/medline/.
Chat bot and free roaming AI in batch
Included in this project is a simple chat bot, a battle AI, and a swarm based free roaming AI.
ADAMS is a workflow engine for building complex knowledge workflows.
ADAMS is a flexible workflow engine aimed at quickly building and maintaining data-driven, reactive workflows, easily integrated into business processes. Instead of placing operators on a canvas and manually connecting them, a tree structure and flow control operators determine how data is processed (sequentially/parallel). This allows rapid development and easy maintenance of large workflows, with hundreds or thousands of operators. Operators include machine learning (WEKA, MOA, MEKA, deeplearning4j) and image processing (ImageJ, JAI, BoofCV, OpenImaJ,LIRE, ImageMagick and Gnuplot). R available using Rserve. WEKA webservice allows other frameworks to use WEKA models. Fast prototyping with Groovy and Jython. Read/write support for various databases and spreadsheet applications.
A python module for hyperspectral image processing
Spectral Python (SPy) is a python package for reading, viewing, manipulating, and classifying hyperspectral image (HSI) data. SPy includes functions for clustering, dimensionality reduction, supervised classification, and more.
Discovering clusters with varying densities
This site provides the source code of two approaches for density-ratio based clustering, used for discovering clusters with varying densities. One approach is to modify a density-based clustering algorithm to do density-ratio based clustering by using its density estimator to compute density-ratio. The other approach involves rescaling the given dataset only. An existing density-based clustering algorithm, which is applied to the rescaled dataset, can find all clusters with varying densities that would otherwise impossible had the same algorithm been applied to the unscaled dataset. Reference: Zhu, Y., Ting, K. M., & Carman, M. J. (2016). Density-ratio based clustering for discovering clusters with varying densities. Pattern Recognition. http://www.sciencedirect.com/science/article/pii/S0031320316301571
CIntruder - OCR Bruteforcing Toolkit
Captcha Intruder is an automatic pentesting tool to bypass captchas.
DSTK - DataScience ToolKit for All of Us
DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. Under settings, you can specify the software path to use for advanced prediction modeling, data transformation/editing, and python IDE. Of course you may specify JASP for advanced data editing and RapidMiner for advanced prediction modeling. DSTK is written in C#, Java and Python to interface with R, NLTK, and Weka. It can be expanded with plugins using R Scripts. We have also created plugins for more statistical functions, and Big Data Analytics with Microsoft Azure HDInsights (Spark Server) with Livy. License: R, RStudio, NLTK, SciPy, SKLearn, MatPlotLib, Weka, ... each has their own licenses.
Scientific Visualisation Made Easy
The Simple Medical Imaging Library Interface (SMILI), pronounced 'smilie', is an open-source, light-weight and easy-to-use medical imaging viewer and library for all major operating systems. The main sMILX application features for viewing n-D images, vector images, DICOMs, anonymizing, shape analysis and models/surfaces with easy drag and drop functions. It also features a number of standard processing algorithms for smoothing, thresholding, masking etc. images and models, both with graphical user interfaces and/or via the command-line. See our YouTube channel for tutorial videos via the homepage. The applications are all built out of a uniform user-interface framework that provides a very high level (Qt) interface to powerful image processing and scientific visualisation algorithms from the Insight Toolkit (ITK) and Visualisation Toolkit (VTK). The framework allows one to build stand-alone medical imaging applications quickly and easily.
Three different software tools for phenotyping plant root images
RootAnalyzer is a fully automated tool, for efficiently extracting and analyzing anatomical traits from root-cross section images. RootAnalyzer segments the plant root from the image's background, classifies and characterizes the cortex, stele, endodermis and metaxylem, and produces statistics about the morphological properties of the root cells and tissues. RTipC is a system for the fully automated detection and classification of root tips in root images obtained either by 2d flat bed scanning or by 3D digital camera imaging. The software provides a robust, efficient and accurate means of phenotyping of roots, by detecting individual root tips and classifying them as belonging to a primary or lateral root. RootGraph is a novel, fully automated and robust approach for the detailed characterization of root traits, based on a graph optimization process. The scheme, firstly, distinguishes primary roots from lateral roots and, secondly, quantifies a broad spectrum of root traits.