Showing 729 open source projects for "data science"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 1
    Book5_Essentials-Probability-Statistics

    Book5_Essentials-Probability-Statistics

    The book 5 of statistics in simplicity

    ...Like the other books in the series, it blends mathematical explanation with Python-based experimentation. Overall, the project provides a practical statistical foundation for students advancing into AI and data science.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Shapash

    Shapash

    Explainability and Interpretability to Develop Reliable ML models

    Shapash is a Python library dedicated to the interpretability of Data Science models. It provides several types of visualization that display explicit labels that everyone can understand. Data Scientists can more easily understand their models, share their results and easily document their projects in an HTML report. End users can understand the suggestion proposed by a model using a summary of the most influential criteria.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Kaggle CLI

    Kaggle CLI

    The official CLI to interact with Kaggle

    ...Its main value is turning Kaggle’s web-based data science platform into a scriptable developer workflow.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    skfolio

    skfolio

    Python library for portfolio optimization built on top of scikit-learn

    skfolio is a Python library designed for portfolio optimization and financial risk management that integrates closely with the scikit-learn ecosystem. The project provides a unified machine learning-style framework for building, validating, and comparing portfolio allocation strategies using financial data. By following the familiar scikit-learn API design, the library allows quantitative researchers and developers to apply techniques such as model selection, cross-validation, and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 99.99% Uptime for MySQL and PostgreSQL Databases Icon
    99.99% Uptime for MySQL and PostgreSQL Databases

    Sub-second maintenance. 2x read/write performance. Built-in vector search for AI apps.

    Cloud SQL Enterprise Plus delivers near-zero downtime with 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server.
    Try Free
  • 5
    kagglehub

    kagglehub

    Python library to access Kaggle resources

    ...The library is designed to work both inside and outside Kaggle Notebooks, with native behavior that can adapt when it runs in Kaggle’s hosted notebook environment. It is useful for machine learning workflows where data, models, and notebook artifacts need to be pulled into scripts, experiments, or pipelines. kagglehub also supports authentication so users can access private or restricted resources when their account has permission. Its main value is making Kaggle assets easier to consume programmatically in Python-first data science and AI development workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    papermill

    papermill

    Parameterize, execute, and analyze notebooks

    ...Instead of manually opening and running a notebook inside JupyterLab or Notebook every time, Papermill lets you inject new values into a specially tagged parameters cell and execute the entire notebook automatically via a script or automation pipeline, which enables robust automation of data analysis, reports, and experiments. This capability is particularly useful in data science and analytics, where a template notebook might be reused for batching reports across dates, customers, or other variables without rewriting code or duplicating notebooks. Papermill supports both Python API usage and a command-line interface, making it flexible for integration with CI/CD systems, shells, and workflow orchestration tools like Airflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    D4RL

    D4RL

    Collection of reference environments, offline reinforcement learning

    ...Researchers can load a dataset for a given task (e.g., maze navigation, manipulation) and apply their algorithm without the need to collect fresh transitions, which accelerates experimentation and comparison. The API is based on Gymnasium (via gym.make) and each environment also exposes a method get_dataset() that returns the offline data to learn from. The repository emphasizes open science, reproducibility, and benchmarking at scale, making it easier to compare algorithms on equal footing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PyMC

    PyMC

    Bayesian Modeling and Probabilistic Programming in Python

    ...Built on top of computational tools like Aesara and NumPy, PyMC allows users to define models using intuitive syntax and perform inference using MCMC, variational inference, and other advanced algorithms. It’s widely used in scientific research, data science, and decision modeling.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    AutoMLOps

    AutoMLOps

    Build MLOps Pipelines in Minutes

    AutoMLOps is a service that generates, provisions, and deploys CI/CD integrated MLOps pipelines, bridging the gap between Data Science and DevOps. AutoMLOps provides a repeatable process that dramatically reduces the time required to build MLOps pipelines. The service generates a containerized MLOps codebase, provides infrastructure-as-code to provision and maintain the underlying MLOps infra, and provides deployment functionalities to trigger and run MLOps pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    openbench

    openbench

    Provider-agnostic, open-source evaluation infrastructure

    openbench is an open-source, provider-agnostic evaluation infrastructure designed to run standardized, reproducible benchmarks on large language models (LLMs), enabling fair comparison across different model providers. It bundles dozens of evaluation suites — covering knowledge, reasoning, math, code, science, reading comprehension, long-context recall, graph reasoning, and more — so users don’t need to assemble disparate datasets themselves. With a simple CLI interface (e.g. bench eval <benchmark> --model <model-id>), you can quickly evaluate any model supported by Groq or other providers (OpenAI, Anthropic, HuggingFace, local models, etc.). openbench also supports private/local evaluations: you can integrate your own custom benchmarks or data (e.g. internal test suites, domain-specific tasks) to evaluate models in a privacy-preserving way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Kaggle Solutions

    Kaggle Solutions

    Collection of Kaggle Solutions and Ideas

    Kaggle Solutions is an open-source repository that compiles winning solutions, insights, and educational resources from hundreds of Kaggle data science competitions. The repository acts as a knowledge base for competitive machine learning by collecting solution write-ups, discussion threads, code notebooks, and tutorial resources shared by top Kaggle participants. Each competition entry typically includes information about the dataset, evaluation metrics, modeling strategies, and techniques used by high-ranking competitors. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Gradio

    Gradio

    Create UIs for your machine learning model in Python in 3 minutes

    ...Hugging Face Spaces will host the interface on its servers and provide you with a link you can share. One of the best ways to share your machine learning model, API, or data science workflow with others is to create an interactive demo that allows your users or colleagues to try out the demo in their browsers.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    Book4_Power-of-Matrix

    Book4_Power-of-Matrix

    Book_4_Matrix Power | The Iris Book: From Addition, Subtraction

    ...The repository is continuously updated and intended to accompany the broader Visualize-ML learning ecosystem. Overall, it serves as a visually driven mathematical foundation for students preparing for data science and machine learning work.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Python Code Tutorials

    Python Code Tutorials

    The Python Code Tutorials

    Python Code Tutorials is a large educational repository that aggregates programming tutorials from the “The Python Code” website into a structured collection of Python projects and learning materials. The repository covers a wide range of programming topics including cybersecurity, networking, web scraping, machine learning, GUI development, and automation scripts. Each tutorial typically includes complete Python code examples and explanations that demonstrate how to build real tools and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    The Missing Semester

    The Missing Semester

    The Missing Semester of Your CS Education

    The Missing Semester is a course and repository that teaches the engineering skills often skipped in traditional computer science curricula: command-line fluency, shell scripting, editors, version control, debugging, data wrangling, and automation. It includes lecture notes, exercises, and sample solutions that encourage hands-on practice rather than passive reading. The curriculum demystifies tools like bash, vim, git, and make, showing how to combine them into efficient workflows that scale from homework to production systems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    FairChem

    FairChem

    FAIR Chemistry's library of machine learning methods for chemistry

    FAIRChem is a unified library for machine learning in chemistry and materials, consolidating data, pretrained models, demos, and application code into a single, versioned toolkit. Version 2 modernizes the stack with a cleaner core package and breaking changes relative to V1, focusing on simpler installs and a stable API surface for production and research. The centerpiece models (e.g., UMA variants) plug directly into the ASE ecosystem via a FAIRChem calculator, so users can run relaxations,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Otter-Grader

    Otter-Grader

    A Python and R autograding solution

    Otter Grader is a light-weight, modular open-source autograder developed by the Data Science Education Program at UC Berkeley. It is designed to work with classes at any scale by abstracting away the autograding internals in a way that is compatible with any instructor's assignment distribution and collection pipeline. Otter supports local grading through parallel Docker containers, grading using the autograder platforms of 3rd party learning management systems (LMSs), the deployment of an Otter-managed grading virtual machine, and a client package that allows students to run public checks on their own machines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Airborne Data Processing and Analysis

    Airborne Data Processing and Analysis

    Software to processing and analyze of airborne measurements.

    ...The software methodology used in ADPAA is provided in the peer-review publication: Delene, D. J., Airborne Data Processing and Analysis Software Package, Earth Science Informatics, 4(1), 29-44, 2011, URL: http://dx.doi.org/10.1007/s12145-010-0061-4, DOI: 10.1007/s12145-010-0061-4.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Gwyddion

    Gwyddion

    Scanning probe microscopy data visualisation and analysis

    A data visualization and processing tool for scanning probe microscopy (SPM, i.e. AFM, STM, MFM, SNOM/NSOM, ...) and profilometry data, useful also for general image and 2D data analysis.
    Leader badge
    Downloads: 1,306 This Week
    Last Update:
    See Project
  • 20
    PyMca
    Stand-alone application and Python tools for interactive and/or batch processing analysis of X-Ray Fluorescence Spectra. Graphical user interface (GUI) and batch processing capabilities provided.
    Leader badge
    Downloads: 144 This Week
    Last Update:
    See Project
  • 21
    HEALPix

    HEALPix

    Data Analysis, Simulations and Visualization on the Sphere

    Software for pixelization, hierarchical indexation, synthesis, analysis, and visualization of data on the sphere. Please acknowledge HEALPix by quoting the web page http://healpix.sourceforge.net (or https://healpix.sourceforge.io) and publication: K.M. Gorski et al., 2005, Ap.J., 622, p.759 Full software documentation available at https://healpix.sourceforge.io/documentation.php Wiki Pages: https://sourceforge.net/p/healpix/wiki/Home Exchanging Data with HEALPix (in FITS files):...
    Leader badge
    Downloads: 345 This Week
    Last Update:
    See Project
  • 22
    Avogadro

    Avogadro

    An intuitive molecular editor and visualization tool

    Avogadro is an advanced molecular editor designed for cross-platform use in computational chemistry, molecular modeling, bioinformatics, materials science and related areas. It offers a flexible rendering framework and a powerful plugin architecture.
    Leader badge
    Downloads: 729 This Week
    Last Update:
    See Project
  • 23
    GMAT

    GMAT

    General Mission Analysis Tool

    The General Mission Analysis Tool (GMAT) is an open-source tool for space mission design and navigation. GMAT is developed by a team of NASA, private industry, and public and private contributors. The GMAT development team is pleased to announce the release of GMAT version R2026a. For a complete list of new features, compatibility changes, and bug fixes, see the R2026a Release Notes in the Users Guide.
    Leader badge
    Downloads: 954 This Week
    Last Update:
    See Project
  • 24
    Asymptote

    Asymptote

    2D & 3D TeX-Aware Vector Graphics Language

    Asymptote is a powerful descriptive vector graphics language for technical drawing, inspired by MetaPost but with an improved C++-like syntax. Asymptote provides for figures the same high-quality typesetting that LaTeX does for scientific text.
    Leader badge
    Downloads: 132 This Week
    Last Update:
    See Project
  • 25
    PyRx - Virtual Screening Tool

    PyRx - Virtual Screening Tool

    Virtual Screening software for Computational Drug Discovery

    PyRx is a Virtual Screening software for Computational Drug Discovery that can be used to screen libraries of compounds against potential drug targets. PyRx enables Medicinal Chemists to run Virtual Screening from any platform and helps users in every step of this process - from data preparation to job submission and analysis of the results. While it is true that there is no magic button in the drug discovery process, PyRx includes docking wizard with easy-to-use user interface which makes...
    Leader badge
    Downloads: 1,788 This Week
    Last Update:
    See Project
Auth0 Logo