World's first open source data quality & data preparation project
MapReduce-based tool to remove duplicate DNA reads
Hadoop spliced read aligner for RNA-seq data
DSTK - DataScience ToolKit for All of Us
sparse and dense matrix, linear algebra, visualization, big data
Log-linear analysis (data modelling) for high-dimensional data