Showing 129 open source projects for "statistical"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    statsmodels

    statsmodels

    Statsmodels, statistical modeling and econometrics in Python

    statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    Book5_Essentials-Probability-Statistics

    Book5_Essentials-Probability-Statistics

    The book 5 of statistics in simplicity

    Book5_Essentials-of-Probability-and-Statistics is a Visualize-ML educational volume that introduces the statistical and probabilistic concepts underpinning modern data analysis and machine learning. The repository explains topics such as distributions, sampling, inference, and uncertainty using visual demonstrations and intuitive narratives. Its teaching philosophy prioritizes conceptual clarity over heavy formalism, making statistical thinking more approachable for beginners. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    pmdarima

    pmdarima

    Statistical library designed to fill the void in Python's time series

    A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Synthetic Data Generator

    Synthetic Data Generator

    SDG is a specialized framework

    ...The system supports multiple generation methods including statistical models, generative adversarial networks, and large language model–based synthesis. It also includes a data processing module capable of handling different data types, preprocessing columns, managing missing values, and converting formats automatically before model training.
    Downloads: 11 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    spaCy

    spaCy

    Industrial-strength Natural Language Processing (NLP)

    spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with an accuracy within 1% of the best available. It's blazing fast, easy to install and comes with a simple and productive API.
    Downloads: 74 This Week
    Last Update:
    See Project
  • 6
    StatsForecast

    StatsForecast

    Fast forecasting with statistical and econometric models

    StatsForecast is a Python library for time-series forecasting that delivers a suite of classical statistical and econometric forecasting models optimized for high performance and scalability. It is designed not just for academic experiments but for production-level time-series forecasting, meaning it handles forecasting for many series at once, efficiently, reliably, and with minimal overhead. The library implements a broad set of models, including AutoARIMA, ETS, CES, Theta, plus a battery of benchmarking and baseline methods, giving users flexibility in selecting forecasting approaches depending on data characteristics (trend, seasonality, intermittent demand, etc.). ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    PaperBanana

    PaperBanana

    Extension of Google Research’s PaperBanana

    PaperBanana is an open-source agentic framework designed to automatically generate publication-quality academic diagrams and statistical plots directly from text descriptions. The project focuses on helping researchers, educators, and data scientists transform conceptual descriptions of figures into structured visual outputs suitable for research papers, presentations, and technical reports. Instead of manually designing charts or diagrams using traditional visualization tools, users can describe the desired figure in natural language and allow the system to generate the visual representation automatically. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    NVIDIA Earth2Studio

    NVIDIA Earth2Studio

    Open-source deep-learning framework

    ...The toolkit makes it easy to run deterministic and ensemble forecasts, swap models interchangeably, and process large geophysical datasets with Xarray structures, enabling experimentation with state-of-the-art deep learning models for climate and atmospheric prediction. Users can extend Earth2Studio with optional model packs, advanced data interfaces, statistical operators, and backend integrations that support flexible workflows from simple tests to large-scale operational inference.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Natural Language Toolkit
    The Natural Language Toolkit (NLTK) is a widely used open-source Python library designed for working with human language data and building natural language processing (NLP) applications. It provides a comprehensive suite of modules, datasets, and tutorials that support both symbolic and statistical approaches to language processing. The toolkit includes implementations of many foundational NLP algorithms and utilities, enabling developers to perform tasks such as tokenization, stemming, parsing, classification, and semantic reasoning. NLTK was originally developed to support research and teaching in computational linguistics and artificial intelligence, and it has become one of the most influential educational platforms for learning NLP in Python. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    plotly.py

    plotly.py

    The interactive graphing library for Python

    plotly.py is a browser-based, open source graphing library for Python that lets you create beautiful, interactive, publication-quality graphs. Built on top of plotly.js, it is a high-level, declarative charting library that ships with more than 30 chart types. Everything from statistical charts and scientific charts, through to maps, 3D graphs and animations, plotly.py lets you create them all. Graphs made with plotly.py can be viewed in Jupyter notebooks, standalone HTML files, or hosted online using Chart Studio Cloud.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 11
    TensorFlow Probability

    TensorFlow Probability

    Probabilistic reasoning and statistical analysis in TensorFlow

    TensorFlow Probability is a library for probabilistic reasoning and statistical analysis. TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning on modern hardware (TPU, GPU). It's for data scientists, statisticians, ML researchers, and practitioners who want to encode domain knowledge to understand data and make predictions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Orange Data Mining

    Orange Data Mining

    Orange: Interactive data analysis

    Open source machine learning and data visualization. Build data analysis workflows visually, with a large, diverse toolbox. Perform simple data analysis with clever data visualization. Explore statistical distributions, box plots and scatter plots, or dive deeper with decision trees, hierarchical clustering, heatmaps, MDS and linear projections. Even your multidimensional data can become sensible in 2D, especially with clever attribute ranking and selections. Interactive data exploration for rapid qualitative analysis with clean visualizations. ...
    Downloads: 48 This Week
    Last Update:
    See Project
  • 13
    PyMC

    PyMC

    Bayesian Modeling and Probabilistic Programming in Python

    PyMC is a Python library for probabilistic programming focused on Bayesian statistical modeling and machine learning. Built on top of computational tools like Aesara and NumPy, PyMC allows users to define models using intuitive syntax and perform inference using MCMC, variational inference, and other advanced algorithms. It’s widely used in scientific research, data science, and decision modeling.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Potpie

    Potpie

    Create custom engineering agents for your codebase

    Potpie is an AI-powered data analysis tool that automates the exploration and visualization of datasets, assisting users in uncovering insights without extensive coding.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    NBA Sports Betting Machine Learning

    NBA Sports Betting Machine Learning

    NBA sports betting using machine learning

    ...Machine learning models are then trained to estimate the probability that a team will win a game as well as whether the total score will fall above or below the sportsbook’s predicted total. In addition to predicting outcomes, the project evaluates expected value to determine whether a potential bet offers a statistical advantage compared with sportsbook odds.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 16
    WeasyPrint

    WeasyPrint

    The awesome document factory

    WeasyPrint is a smart solution helping people to create PDF documents. You can generate gorgeous statistical reports, invoices, tickets, and anything you want as long as you have some webdesign skills! Design your documents just as you design your websites! WeasyPrint follows the widely used HTML and CSS specifications from the W3C. You can use your usual web tools, languages and frameworks, but for print. Creating high-quality digital documents requires features that you love to use as readers, tables of contents, links, annotations, optimized images, attachments, WeasyPrint provides many features out of the box, and even gives you the possibility to add your own ways to customize your PDF files. ...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 17
    Edit Banana

    Edit Banana

    Edit Banana: A framework for converting statistical figures

    Edit Banana is an innovative web application designed to simplify image editing by merging intuitive user interfaces with powerful generative AI capabilities, enabling users to quickly enhance, manipulate, or transform photos without needing advanced design skills. It provides a smooth, browser-based experience where users can upload images, make precise edits such as background removal or inpainting, and apply stylistic transformations or corrections through AI prompts. The tool focuses on...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 18
    AutoResearchClaw

    AutoResearchClaw

    Autonomous research from idea to paper. Chat an Idea. Get a Paper 🦞

    ...The system retrieves real academic references from sources such as arXiv and Semantic Scholar to ensure credible citations. It can automatically generate code for experiments, run them in a sandbox environment, and analyze the results with statistical methods. The platform also uses multi-agent debate and automated peer review processes to refine research findings and improve paper quality. By combining literature discovery, experimentation, and writing automation, AutoResearchClaw aims to turn research ideas into conference-ready papers with minimal human intervention.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 19
    CodeChecker

    CodeChecker

    CodeChecker is an analyzer tooling, defect database

    CodeChecker is a static analysis infrastructure built on the LLVM/Clang Static Analyzer toolchain, replacing scan-build in a Linux or macOS (OS X) development environment. Executes Clang-Tidy and Clang Static Analyzer with Cross-Translation Unit analysis, Statistical Analysis (when checkers are available). Creates the JSON compilation database by wiretapping any build process (e.g., CodeChecker log -b "make"). Automatically analyzes GCC cross-compiled projects: detecting GCC or Clang compiler configuration and forming the corresponding clang analyzer invocations. Incremental analysis: Only the changed files and its dependencies need to be reanalyzed. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20
    MiniSom

    MiniSom

    MiniSom is a minimalistic implementation of the Self Organizing Maps

    MiniSom is a minimalistic and Numpy-based implementation of the Self Organizing Maps (SOM). SOM is a type of Artificial Neural Network able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display. Minisom is designed to allow researchers to easily build on top of it and to give students the ability to quickly grasp its details. The project initially aimed for a minimalistic implementation of the Self-Organizing Map (SOM) algorithm, focusing on simplicity in features, dependencies, and code style. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    DataProfiler

    DataProfiler

    Extract schema, statistics and entities from datasets

    DataProfiler is an AI-powered tool for automatic data analysis and profiling, designed to detect patterns, anomalies, and schema inconsistencies in structured and unstructured datasets. The DataProfiler is a Python library designed to make data analysis, monitoring, and sensitive data detection easy. Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities (PII / NPI), and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    ydata-profiling

    ydata-profiling

    Create HTML profiling reports from pandas DataFrame objects

    ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Copulas

    Copulas

    A library to model multivariate data using copulas

    Copulas is a Python library for modeling multivariate distributions and sampling from them using copula functions. Given a table of numerical data, use Copulas to learn the distribution and generate new synthetic data following the same statistical properties. Choose from a variety of univariate distributions and copulas – including Archimedian Copulas, Gaussian Copulas and Vine Copulas. Compare real and synthetic data visually after building your model. Visualizations are available as 1D histograms, 2D scatterplots and 3D scatterplots. Access & manipulate learned parameters. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    whylogs

    whylogs

    The open standard for data logging

    ...With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to track changes in their dataset Create data constraints to know whether their data looks the way it should. Quickly visualize key summary statistics about their datasets. whylogs profiles are the core of the whylogs library. They capture key statistical properties of data, such as the distribution (far beyond simple mean, median, and standard deviation measures), the number of missing values, and a wide range of configurable custom metrics. By capturing these summary statistics, we are able to accurately represent the data and enable all of the use cases described in the introduction.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    NeuralForecast

    NeuralForecast

    Scalable and user friendly neural forecasting algorithms.

    ...Unfortunately, available implementations and published research are yet to realize neural networks' potential. They are hard to use and continuously fail to improve over statistical methods while being computationally prohibitive. For this reason, we created NeuralForecast, a library favoring proven accurate and efficient models focusing on their usability.
    Downloads: 7 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB