computing f2 bootstrap CI BCA
Computing similarity factor (f2) bootstrap bias corrected and accelerated confidence interval
An R package implementation of a consensus clustering methodology. This package allows users to perform re-sampling statistics based clustering using multiple clustering algorithms to assess the robustness of both clusters and members of clusters.
An application to perform statistical clustering analysis
Software tool for Research in Computational Population Genetics
Development of exact and approximate methods (Importance Sampling and MCMC based) for computing likelihoods under the standard population genetic models of mutation,migration & recombination. Project issues are mainatined at https://freecode4susant.atlassian.net/browse/COALESCENT
Creates a data density plot of a 2 dimensional data distribution.
Shows the data density of a 2 dimensional distribution. The problem of showing data density visually is not mathematically well defined, and there are several methods. The program uses sum of reciprocal squared distance to calculate density at each point, with a smear factor to prevent points going to infinity. The smear factor also controls the amount of clustering. There are several options for colour output. Input is via a csv (comma-separated values) file. Now there's a nice GUI built in Baby X for Linux and Windows
The final build of this software now is distributed in R, embedded in "RedeR': an R/Bioconductor package for hierarchical and nested network analysis... more about RedeR: http://bioconductor.org/packages/2.9/bioc/html/RedeR.html
The fantail machine learning toolkit (Moved)
Moved to https://github.com/quansun/fantail-ml
A Python package for estimating the statistical impact of features
This package let's you compute the statistical impact of features given a scikit-learn estimator. The computation is based on the mean variation of the difference between quantile and original predictions. The impact is reliable for regressors and binary classifiers. Currently, all features must consist only of pure-numerical, non-categorical values.
A software framework to build maps for Neurospora crassa genome based on probabilistic models of meiotic recombination. A netbeans platform application is built to incorporate the computations. Project issues are mainatined at https://freecode4susant.atlassian.net/browse/GENOMEMAP
A Java library to model and fit ARTA processes.
Massively Parallel Graph processing on GPUs -- now part of Blazegraph
Mapgraph is SYSTAP’s disruptive new technology to exploit the main memory bandwidth advantages of GPUs. The early work was co-developed with the University of Utah SCI Institute and has its pedigree in the UINTAH software running on over 750M cores on the TITAN Super Computer. Today, SYSTAP has commercialized this technology into it’s Blazegraph Accelerator and Blazegraph HPC products. Checkout our options for GPU acceleration of graphs or contact us to learn more: https://www.blazegraph.com/product/gpu-accelerated/. The early work was released under the Apache 2 open source license and is available on here at Sourceforge. This work was (partially) funded by the DARPA XDATA program under AFRL Contract #FA8750-13-C-0002 and DARPA Contract #D14PC00029.
Tools to analyse and use passport data for biological collections.
Python module to track the overall median of a stream of values "on-line" in reasonably efficient fashion.
External plugins for modnlp/teccli
This is a general project for modnlp/teccli plugins, with focus on text visualizaton.
R package for modelling anthropogenic deforestation
phcfM is an R package for modelling anthropogenic deforestation. It was named after the REDD+ pilot-project 'programme holistique de conservation des forêts à Madagascar'. phcfM includes two main functions: (i) demography(), to model the population growth with time in a hierarchical Bayesian framework using population census data and Gaussian linear mixed models and (ii) deforestation(), to model the deforestation process in a hierarchical Bayesian framework using land-cover change data and Binomial logistic regression models with variable time-intervals between land-cover observations. The two functions use embedded Gibbs samplers written in C++ with the Scythe statistical library to reduce computational time.
Pequeno script em Python para provar o problema de Monty Hall
O jogo consiste no seguinte: Monty Hall (o apresentador) apresentava 3 portas aos concorrentes, sabendo que atrás de uma delas está um carro (prémio bom) e que as outras têm prêmios de pouco valor. Na 1ª etapa o concorrente escolhe uma porta (que ainda não é aberta); De seguida Monty abre uma das outras duas portas que o concorrente não escolheu, sabendo à partida que o carro não se encontra aí; Agora com duas portas apenas para escolher — pois uma delas já se viu, na 2ª etapa, que não tinha o prêmio — e sabendo que o carro está atrás de uma delas, o concorrente tem que se decidir se permanece com a porta que escolheu no início do jogo e abre-a ou se muda para a outra porta que ainda está fechada para então a abrir. Qual é a estratégia mais lógica? Ficar com a porta escolhida inicialmente ou mudar de porta? Com qual das duas portas ainda fechadas o concorrente tem mais probabilidades de ganhar? Por quê?
R interface to the Corpus Query Protocol
Implements the Corpus Query Protocol as a package for the R statistical environment. It allows to query linguistic corpora and manipulate the data as native R objects. It is based on the CWB software.
Predicting ribosome footprint profile shapes from transcript sequences
Riboshape is a suite of algorithms to predict ribosome footprint profile shapes from transcript sequences. It applies kernel smoothing to codon sequences to build predictive features, and uses these features to builds a sparse regression model to predict the ribosome footprint profile shapes. Reference: Liu, T.-Y. and Song, Y.S. Prediction of ribosome footprint profile shapes from transcript sequences. Proceedings of ISMB 2016, Bioinformatics, Vol. 32 No. 12 (2016) i183-i191.
Solve the Viterbi algorithm in a data stream
It is often necessary to assign a series of discrete values to continuosly variable data sequenced by time, position, etc., thereby parsing the data into fewer and larger segments of variable width. The 'segment' utility takes an input data stream as a Hidden Markov Model and applies the Viterbi algorithm to find the most likely segmentation path through the data.
Web-based data science analysis and visualization platform.
This is Slycat - a web-based data science analysis and visualization platform, created at Sandia National Laboratories. The goal of the Slycat project is to develop processes, tools and techniques to support data science, particularly analysis of large, high-dimensional data.
A Matlab toolbox for interfacing with the pure JAVA numerical library Snifflib. This toolbox provides convenience m-files for interoperability with Snifflib from within an active Matlab session running a JAVA virtual machine.
snlanalytic is a small Python script that takes a stem-and-leaf plot as input and returns basic statistics (sum, mean, median, mode) to the user.
Python module for statistics built on top of NumPy/SciPy
C++ Statistical ToolKit
STK++ (http://www.stkpp.org) is a versatile, fast, reliable and elegant collection of C++ classes for statistics, clustering, linear algebra, arrays (with an Eigen-like API), regression, dimension reduction, etc. Some functionalities provided by the library are available in the R environment as R functions (http://cran.at.r-project.org/web/packages/rtkore/index.html). At a convenience, we propose the source packages on sourceforge. The library offers a dense set of (mostly) template classes in C++ and is suitable for projects ranging from small one-off projects to complete data mining application suites.
Stata command for evaluating seasonality
tcsi is a Stata command for evaluating seasonality according to the transportation cost approach by G. L. Lo Magno, M. Ferrante and S. De Cantis.