statistics free download

Showing 136 open source projects for "statistics"

View related business solutions

Data Management Clear Filters & Widen Search

Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
Earn up to 16% annual interest with Nexo.
Let your crypto work for you

Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
1

Bayesian Statistics

This repository holds slides and code for a full Bayesian statistics

This repository holds slides and code for a full Bayesian statistics graduate course. Bayesian statistics is an approach to inferential statistics based on Bayes' theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution. ...

Downloads: 3 This Week

Last Update: 2025-08-17
See Project
2

StatsBase.jl

Basic statistics for Julia

StatsBase.jl is a Julia package that provides basic support for statistics. Particularly, it implements a variety of statistics-related functions, such as scalar statistics, high-order moment computation, counting, ranking, covariances, sampling, and empirical density estimation.

Downloads: 3 This Week

Last Update: 2026-01-12
See Project
3

OnlineStats.jl

Single-pass algorithms for statistics

OnlineStats does statistics and data visualization for big/streaming data via online algorithms. High-performance single-pass algorithms for statistics and data viz. Updated one observation at a time. Algorithms use O(1) memory. Algorithms use O(1) memory.

Downloads: 1 This Week

Last Update: 2025-12-01
See Project
4

MultivariateStats.jl

A Julia package for multivariate statistics and data analysis

A Julia package for multivariate statistics and data analysis (e.g. dimensionality reduction).

Downloads: 2 This Week

Last Update: 2024-06-05
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

Panda-Helper

Panda-Helper: Data profiling utility for Pandas DataFrames and Series

Panda-Helper is a simple data-profiling utility for Pandas DataFrames and Series. Assess data quality and usefulness with minimal effort. Quickly perform initial data exploration, so you can move on to more in-depth analysis.

Downloads: 5 This Week

Last Update: 2025-02-05
See Project
6

Datumaro

Dataset Management Framework, a Python library and a CLI tool to build

...Datumaro makes it easy to merge datasets, split them into training/validation/test subsets, filter or transform annotations, and validate annotation quality — all while preserving metadata and supporting detailed statistics. It’s especially useful when you’re dealing with heterogeneous data sources or need to prepare complex datasets for machine learning workflows, freeing you from writing custom scripts for every format conversion.

Downloads: 6 This Week

Last Update: 2026-01-07
See Project
7

pandas

Fast, flexible and powerful Python data analysis toolkit

pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase. pandas is continuously being developed to be a fundamental high-level building...

Downloads: 120 This Week

Last Update: 2026-03-30
See Project
8

whylogs

The open standard for data logging

...With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to track changes in their dataset Create data constraints to know whether their data looks the way it should. Quickly visualize key summary statistics about their datasets. whylogs profiles are the core of the whylogs library. They capture key statistical properties of data, such as the distribution (far beyond simple mean, median, and standard deviation measures), the number of missing values, and a wide range of configurable custom metrics. By capturing these summary statistics, we are able to accurately represent the data and enable all of the use cases described in the introduction.

Downloads: 3 This Week

Last Update: 2024-12-03
See Project
9

Metaflow

A framework for real-life data science

Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.

Downloads: 3 This Week

Last Update: 2026-03-21
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
10

ydata-profiling

Create HTML profiling reports from pandas DataFrame objects

ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.

Downloads: 7 This Week

Last Update: 2 days ago
See Project
11

Pandas Profiling

Create HTML profiling reports from pandas DataFrame objects

pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding. High correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik). Most common categories (uppercase, lowercase,...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
12

CausalityTools.jl

Algorithms for detecting associations, dynamical influences

CausalityTools.jl is a package for quantifying associations and dynamical coupling between datasets, independence testing, and causal inference. Association measures from conventional statistics, information theory, and dynamical systems theory, for example, distance correlation, mutual information, transfer entropy, convergent cross mapping and a lot more. A dedicated API for independence testing, which comes with automatic compatibility with every measure-estimator combination you can think of. For example, we offer the generic SurrogateTest, which is fully compatible with TimeseriesSurrogates.jl, and the LocalPermutationTest for conditional independence testing.

Downloads: 4 This Week

Last Update: 2025-08-17
See Project
13

Riemann

A network event stream processing system, in Clojure

Riemann aggregates events from your servers and applications with a powerful stream processing language. Send an email for every exception in your app. Track the latency distribution of your web app. See the top processes on any host, by memory and CPU. Combine statistics from every Riak node in your cluster and forward to Graphite. Track user activity from second to second. Riemann streams are just functions which accept an event. Events are just structs with some common fields like :host and :service You can use dozens of built-in streams for filtering, altering, and combining events, or write your own. ...

Downloads: 10 This Week

Last Update: 2025-05-26
See Project
14

Coverage.jl

Take Julia code coverage and memory allocation results, do useful thin

Julia can track how many times, if any, each line of your code is run. This is useful for measuring how much of your code base your tests actually test, and can reveal the parts of your code that are not tested and might be hiding a bug. You can use Coverage.jl to summarize the results of this tracking or to send them to a service like Coveralls.io or Codecov.io. Julia can track how much memory is allocated by each line of your code. This can reveal problems like type instability, or...

Downloads: 2 This Week

Last Update: 2025-10-31
See Project
15

clusterProfiler

A universal enrichment tool for interpreting omics data

clusterProfiler is an R/Bioconductor package that provides a unified workflow for functional enrichment analysis to interpret high-throughput omics results. It supports both over-representation analysis and gene set enrichment analysis, letting you work with unranked gene lists or ranked statistics from differential pipelines. The package connects to multiple knowledge bases—such as Gene Ontology, KEGG, Reactome, Disease Ontology, MeSH and others—through a consistent interface so you can query different biological lenses without rewriting code. It is designed for breadth, covering coding and non-coding features and thousands of organisms by leveraging continuously updated annotations. ...

Downloads: 2 This Week

Last Update: 2026-04-01
See Project
16

Java Tablesaw

Java dataframe and visualization library

Tablesaw is a dataframe and visualization library that supports loading, cleaning, transforming, filtering, and summarizing data. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and can be used to prepare data for working with machine learning libraries like Smile, Tribuo, H20.ai, DL4J. Import data from RDBMS, Excel, CSV, TSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.) Tablesaw supports data visualization by providing a wrapper for the Plot.ly JavaScript plotting library. ...

Downloads: 3 This Week

Last Update: 2025-06-27
See Project
17

forecast

Forecasting Functions for Time Series and Linear Models

...It provides functions for building, assessing, and using univariate forecasting models (e.g. ARIMA, exponential smoothing, etc.), tools for automatic model selection, diagnostics, plotting, forecasting future values, etc. It's widely used in statistics, economics, business forecasting, environmental science, etc. Exponential smoothing state space models (ETS) including seasonal components. Residual checks, model accuracy, plots, forecast error measures etc.

Downloads: 1 This Week

Last Update: 2026-03-17
See Project
18

SDGym

Benchmarking synthetic data generation methods

The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking. You also customize the process to include your own work. Select any of the publicly available datasets from the SDV project, or input your own data. Choose from any of the SDV synthesizers and baselines. ...

Downloads: 7 This Week

Last Update: 1 day ago
See Project
19

collapse

Advanced and Fast Data Transformation in R

...It operates on base R data structures like data frames and vectors and uses highly optimized C++ code under the hood to deliver significant speed improvements. collapse also includes tools for grouped operations, weighted statistics, and time series manipulation, making it a compact yet powerful utility for data scientists and researchers working in R.

Downloads: 0 This Week

Last Update: 2025-12-22
See Project
20

targets

Function-oriented Make-like declarative workflows for R

The targets package is a pipeline / workflow management tool in R, designed to coordinate multi‐step computational workflows in data science / statistics. It tracks dependencies between “targets” (computational steps), skips steps whose upstream data or code hasn’t changed, supports parallel computation, branching (dynamic generation of sub‐targets), file format abstractions, and encourages reproducible and efficient analyses. It’s something like GNU Make for R, but more integrated. ...

Downloads: 0 This Week

Last Update: 2026-02-09
See Project
21

Apache Hudi

Upserts, Deletes And Incremental Processing on Big Data

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...

Downloads: 0 This Week

Last Update: 2025-12-18
See Project
22

HStreamDB

HStreamDB is an open-source, cloud-native streaming database

...HStreamDB provides built-in support for event time-based stream processing. You can use your familiar SQL to perform basic filtering and transformation operations, statistics and aggregation based on multiple kinds of time windows and even joining between multiple streams. With connectors provided, you can easily integrate HStreamDB with other external systems, such as MQTT Broker, MySQL, Redis and ElasticSearch. More connectors will be added.

Downloads: 0 This Week

Last Update: 2024-04-26
See Project
23

Emerge

Browser-based interactive codebase and dependency visualization tool

Emerge (or emerge-viz) is an interactive code analysis tool to gather insights about source code structure, metrics, dependencies, and complexity of software projects. You can scan the source code of a project, calculate metric results and statistics, generate an interactive web app with graph structures (e.g. a dependency graph or a filesystem graph), and export the results in some file formats. Emerge currently has parsing support for the following languages: C, C++, Groovy, Java, JavaScript, TypeScript, Kotlin, ObjC, Ruby, Swift, Python, and Go. The structure, coloring, and clustering is calculated and based on the idea of combining a force-directed graph simulation and Louvain modularity. emerge is mainly written in Python 3 and is tested on macOS, Linux, and modern web browsers (i.e., the latest Safari, Chrome, Firefox, and Edge).

Downloads: 0 This Week

Last Update: 2024-07-12
See Project
24

LabPlot

Data Visualization and Analysis

LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.

4 Reviews

Downloads: 44 This Week

Last Update: 2025-08-18
See Project
25

seaborn

Statistical data visualization in Python

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of...

Downloads: 7 This Week

Last Update: 2024-01-25
See Project