A cross-platform statistical package for econometric analysis
gretl is a cross-platform software package for econometric analysis, written in the C programming language.
Unicode-XML-TEI text/corpus analysis platform
TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7 TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
Machine Learning Python
mlpy is a Python module for Machine Learning built on top of NumPy/SciPy and of GSL. mlpy provides high-level functions and classes allowing, with few lines of code, the design of rich workflows for classification, regression, clustering and feature selection. mlpy is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 3. mlpy is available both for Python >=2.6 and Python 3.X.
Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout
Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout (MAGeCK) is a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens technology. For instructions and documentations, please refer to the wiki page. MAGeCK is developed and maintained by Wei Li and Han Xu from Dr. Xiaole Shirley Liu's lab at Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health. We thank the support from Claudia Adams Barr Program in Innovative Basic Cancer Research to develop MAGeCK.
a Small (Matlab/Octave) Toolbox for Kriging
The STK is a (not so) Small Toolbox for Kriging. Its primary focus in on the interpolation / regression technique known as kriging, which is very closely related to Splines and Radial Basis Functions, and can be interpreted as a non-parametric Bayesian method using a Gaussian Process (GP) prior. The STK also provides tools for the sequential and non-sequential design of experiments. Even though it is, currently, mostly geared towards the Design and Analysis of Computer Experiments (DACE), the STK can be useful for other applications areas (such as Geostatistics, Machine Learning, Non-parametric Regression, etc.).
Maximal Information-based Nonparametric Exploration
The minepy homepage has moved to http://minepy.readthedocs.io. The download page is now at https://github.com/minepy/minepy/releases.
Massively Parallel Graph processing on GPUs -- now part of Blazegraph
Mapgraph is SYSTAP’s disruptive new technology to exploit the main memory bandwidth advantages of GPUs. The early work was co-developed with the University of Utah SCI Institute and has its pedigree in the UINTAH software running on over 750M cores on the TITAN Super Computer. Today, SYSTAP has commercialized this technology into it’s Blazegraph Accelerator and Blazegraph HPC products. Checkout our options for GPU acceleration of graphs or contact us to learn more: https://www.blazegraph.com/product/gpu-accelerated/. The early work was released under the Apache 2 open source license and is available on here at Sourceforge. This work was (partially) funded by the DARPA XDATA program under AFRL Contract #FA8750-13-C-0002 and DARPA Contract #D14PC00029.
A population-based method for DNA copy number analysis: recurrent copy number aberration indentification in multiple samples (with no need of single-sample calling). Developed for a quick analysis of high resolution and large population data.
Handling and basic analysis of hyperspectral data in R
The hsdar package contains classes and functions to manage, analyse and simulate hyperspectral data. These might be either spectrometer measurements or hyperspectral images through the interface of rgdal.
R package for hierarchical species distribution models
hSDM is an R package for hierarchical species distribution models. Such models allows interpreting the observations (occurrence and abundance of a species) as a result of several hierarchical processes including ecological processes (habitat suitability, spatial dependence and anthropogenic disturbance) and observation processes (species detectability). Hierarchical species distribution models are essential for accurately characterizing the environmental response of species, predicting their probability of occurrence, and assessing uncertainty in the model results.
FastPval is multiple stage p-value computing software that computes empirical p-values from a large set of permutated/resampled background data.
GPU based Parallel Gene-Gene Interaction Analysis
Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. Citation: Chikkagoudar, S., Wang, K., & Li, M. (2011). GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores. BMC research notes, 4(1), 158.
GpaNom is a simple command line GPA calculator, written in C, with the goal of being fast and precise.
Unix/Linux math calculator
An easy, small and handy math calculator for Unix/Linux systems. It can calculate easy and complex mathematical expressions passed as command line arguments.
Statistics modules in Perl Data Language, with a quick-start guide for non-PDL people. They make the PDL shell work like R, but with PDL threading (fast automatic iteration) of procedures including t-test, linear regression, and k-means clustering.
A Univariate Time Series Analysis package in ANSI C
CTSA is a C software package for univariate time series analysis. This is a work in progress. ARIMA and Seasonal ARIMA models have been added so far. Other functionality will be added soon . Documentation - https://github.com/rafat/ctsa/wiki
Creates a data density plot of a 2 dimensional data distribution.
Shows the data density of a 2 dimensional distribution. The problem of showing data density visually is not mathematically well defined, and there are several methods. The program uses sum of reciprocal squared distance to calculate density at each point, with a smear factor to prevent points going to infinity. The smear factor also controls the amount of clustering. There are several options for colour output. Input is via a csv (comma-separated values) file. Now there's a nice GUI built in Baby X for Linux and Windows
R interface to the Corpus Query Protocol
Implements the Corpus Query Protocol as a package for the R statistical environment. It allows to query linguistic corpora and manipulate the data as native R objects. It is based on the CWB software.