Showing 136 open source projects for "statistics"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Bayesian Statistics

    Bayesian Statistics

    This repository holds slides and code for a full Bayesian statistics

    This repository holds slides and code for a full Bayesian statistics graduate course. Bayesian statistics is an approach to inferential statistics based on Bayes' theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    StatsBase.jl

    StatsBase.jl

    Basic statistics for Julia

    StatsBase.jl is a Julia package that provides basic support for statistics. Particularly, it implements a variety of statistics-related functions, such as scalar statistics, high-order moment computation, counting, ranking, covariances, sampling, and empirical density estimation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    OnlineStats.jl

    OnlineStats.jl

    Single-pass algorithms for statistics

    OnlineStats does statistics and data visualization for big/streaming data via online algorithms. High-performance single-pass algorithms for statistics and data viz. Updated one observation at a time. Algorithms use O(1) memory. Algorithms use O(1) memory.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    MultivariateStats.jl

    MultivariateStats.jl

    A Julia package for multivariate statistics and data analysis

    A Julia package for multivariate statistics and data analysis (e.g. dimensionality reduction).
    Downloads: 2 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Panda-Helper

    Panda-Helper

    Panda-Helper: Data profiling utility for Pandas DataFrames and Series

    Panda-Helper is a simple data-profiling utility for Pandas DataFrames and Series. Assess data quality and usefulness with minimal effort. Quickly perform initial data exploration, so you can move on to more in-depth analysis.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    ...Datumaro makes it easy to merge datasets, split them into training/validation/test subsets, filter or transform annotations, and validate annotation quality — all while preserving metadata and supporting detailed statistics. It’s especially useful when you’re dealing with heterogeneous data sources or need to prepare complex datasets for machine learning workflows, freeing you from writing custom scripts for every format conversion.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase. pandas is continuously being developed to be a fundamental high-level building...
    Downloads: 120 This Week
    Last Update:
    See Project
  • 8
    whylogs

    whylogs

    The open standard for data logging

    ...With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to track changes in their dataset Create data constraints to know whether their data looks the way it should. Quickly visualize key summary statistics about their datasets. whylogs profiles are the core of the whylogs library. They capture key statistical properties of data, such as the distribution (far beyond simple mean, median, and standard deviation measures), the number of missing values, and a wide range of configurable custom metrics. By capturing these summary statistics, we are able to accurately represent the data and enable all of the use cases described in the introduction.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Metaflow

    Metaflow

    A framework for real-life data science

    Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    ydata-profiling

    ydata-profiling

    Create HTML profiling reports from pandas DataFrame objects

    ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding. High correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik). Most common categories (uppercase, lowercase,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    CausalityTools.jl

    CausalityTools.jl

    Algorithms for detecting associations, dynamical influences

    CausalityTools.jl is a package for quantifying associations and dynamical coupling between datasets, independence testing, and causal inference. Association measures from conventional statistics, information theory, and dynamical systems theory, for example, distance correlation, mutual information, transfer entropy, convergent cross mapping and a lot more. A dedicated API for independence testing, which comes with automatic compatibility with every measure-estimator combination you can think of. For example, we offer the generic SurrogateTest, which is fully compatible with TimeseriesSurrogates.jl, and the LocalPermutationTest for conditional independence testing.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Riemann

    Riemann

    A network event stream processing system, in Clojure

    Riemann aggregates events from your servers and applications with a powerful stream processing language. Send an email for every exception in your app. Track the latency distribution of your web app. See the top processes on any host, by memory and CPU. Combine statistics from every Riak node in your cluster and forward to Graphite. Track user activity from second to second. Riemann streams are just functions which accept an event. Events are just structs with some common fields like :host and :service You can use dozens of built-in streams for filtering, altering, and combining events, or write your own. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 14
    Coverage.jl

    Coverage.jl

    Take Julia code coverage and memory allocation results, do useful thin

    Julia can track how many times, if any, each line of your code is run. This is useful for measuring how much of your code base your tests actually test, and can reveal the parts of your code that are not tested and might be hiding a bug. You can use Coverage.jl to summarize the results of this tracking or to send them to a service like Coveralls.io or Codecov.io. Julia can track how much memory is allocated by each line of your code. This can reveal problems like type instability, or...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    clusterProfiler

    clusterProfiler

    A universal enrichment tool for interpreting omics data

    clusterProfiler is an R/Bioconductor package that provides a unified workflow for functional enrichment analysis to interpret high-throughput omics results. It supports both over-representation analysis and gene set enrichment analysis, letting you work with unranked gene lists or ranked statistics from differential pipelines. The package connects to multiple knowledge bases—such as Gene Ontology, KEGG, Reactome, Disease Ontology, MeSH and others—through a consistent interface so you can query different biological lenses without rewriting code. It is designed for breadth, covering coding and non-coding features and thousands of organisms by leveraging continuously updated annotations. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Java Tablesaw

    Java Tablesaw

    Java dataframe and visualization library

    Tablesaw is a dataframe and visualization library that supports loading, cleaning, transforming, filtering, and summarizing data. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and can be used to prepare data for working with machine learning libraries like Smile, Tribuo, H20.ai, DL4J. Import data from RDBMS, Excel, CSV, TSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.) Tablesaw supports data visualization by providing a wrapper for the Plot.ly JavaScript plotting library. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    forecast

    forecast

    Forecasting Functions for Time Series and Linear Models

    ...It provides functions for building, assessing, and using univariate forecasting models (e.g. ARIMA, exponential smoothing, etc.), tools for automatic model selection, diagnostics, plotting, forecasting future values, etc. It's widely used in statistics, economics, business forecasting, environmental science, etc. Exponential smoothing state space models (ETS) including seasonal components. Residual checks, model accuracy, plots, forecast error measures etc.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    SDGym

    SDGym

    Benchmarking synthetic data generation methods

    The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking. You also customize the process to include your own work. Select any of the publicly available datasets from the SDV project, or input your own data. Choose from any of the SDV synthesizers and baselines. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    collapse

    collapse

    Advanced and Fast Data Transformation in R

    ...It operates on base R data structures like data frames and vectors and uses highly optimized C++ code under the hood to deliver significant speed improvements. collapse also includes tools for grouped operations, weighted statistics, and time series manipulation, making it a compact yet powerful utility for data scientists and researchers working in R.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    targets

    targets

    Function-oriented Make-like declarative workflows for R

    The targets package is a pipeline / workflow management tool in R, designed to coordinate multi‐step computational workflows in data science / statistics. It tracks dependencies between “targets” (computational steps), skips steps whose upstream data or code hasn’t changed, supports parallel computation, branching (dynamic generation of sub‐targets), file format abstractions, and encourages reproducible and efficient analyses. It’s something like GNU Make for R, but more integrated. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    HStreamDB

    HStreamDB

    HStreamDB is an open-source, cloud-native streaming database

    ...HStreamDB provides built-in support for event time-based stream processing. You can use your familiar SQL to perform basic filtering and transformation operations, statistics and aggregation based on multiple kinds of time windows and even joining between multiple streams. With connectors provided, you can easily integrate HStreamDB with other external systems, such as MQTT Broker, MySQL, Redis and ElasticSearch. More connectors will be added.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Emerge

    Emerge

    Browser-based interactive codebase and dependency visualization tool

    Emerge (or emerge-viz) is an interactive code analysis tool to gather insights about source code structure, metrics, dependencies, and complexity of software projects. You can scan the source code of a project, calculate metric results and statistics, generate an interactive web app with graph structures (e.g. a dependency graph or a filesystem graph), and export the results in some file formats. Emerge currently has parsing support for the following languages: C, C++, Groovy, Java, JavaScript, TypeScript, Kotlin, ObjC, Ruby, Swift, Python, and Go. The structure, coloring, and clustering is calculated and based on the idea of combining a force-directed graph simulation and Louvain modularity. emerge is mainly written in Python 3 and is tested on macOS, Linux, and modern web browsers (i.e., the latest Safari, Chrome, Firefox, and Edge).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    LabPlot

    LabPlot

    Data Visualization and Analysis

    LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 25
    seaborn

    seaborn

    Statistical data visualization in Python

    Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of...
    Downloads: 7 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB