Showing 729 open source projects for "data science"

View related business solutions
  • Cut Data Warehouse Costs by 54% Icon
    Cut Data Warehouse Costs by 54%

    Easily migrate from Snowflake, Redshift, or Databricks with free tools.

    BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.
    Try Free
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    Liger Kernel

    Liger Kernel

    Efficient Triton Kernels for LLM Training

    Liger Kernel is a unified kernel developed by LinkedIn to streamline data science and machine learning workflows across different languages and tools. It provides a consistent interface for running code in various languages (such as Python, R, SQL) within a single Jupyter-like environment, enhancing productivity and collaboration for data scientists working in mixed-language projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    LIFELINES

    LIFELINES

    Survival analysis in Python

    LIFELINES is a pure Python library for survival analysis, a statistical field focused on modeling time until an event occurs. It can be used for traditional cases like medical survival time, but also for business and product questions such as churn, subscription length, equipment failure, and customer retention. The library includes estimators such as Kaplan-Meier, Nelson-Aalen, and regression-based survival models. It is designed to be accessible to Python users and works well with common...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    SageMaker Training Toolkit

    SageMaker Training Toolkit

    Train machine learning models within Docker containers

    Train machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and reliable training process. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    DocArray

    DocArray

    The data structure for multimodal data

    ...The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. Data science powerhouse: greatly accelerate data scientists’ work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU. Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5
    MindsDB

    MindsDB

    Making Enterprise Data Intelligent and Responsive for AI

    MindsDB is an AI data solution that enables humans, AI, agents, and applications to query data in natural language and SQL, and get highly accurate answers across disparate data sources and types. MindsDB connects to diverse data sources and applications, and unifies petabyte-scale structured and unstructured data. Powered by an industry-first cognitive engine that can operate anywhere (on-prem, VPC, serverless), it empowers both humans and AI with highly informed decision-making...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    AI-Tutorials/Implementations Notebooks

    AI-Tutorials/Implementations Notebooks

    Codes/Notebooks for AI Projects

    AI-Tutorials/Implementations Notebooks repository is a comprehensive collection of artificial intelligence tutorials and implementation examples intended for developers, students, and researchers who want to learn by building practical AI projects. The repository contains numerous Jupyter notebooks and code samples that demonstrate modern techniques in machine learning, deep learning, data science, and large language model workflows. It includes implementations for a wide range of AI topics such as computer vision, agent systems, federated learning, distributed systems, adversarial attacks, and generative AI. Many of the tutorials focus on building AI agents, multi-agent systems, and workflows that integrate language models with external tools or APIs. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Mlxtend

    Mlxtend

    A library of extension and helper modules for Python's data analysis

    Mlxtend (machine learning extensions) is a Python library of useful tools for day-to-day data science tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Scientific Agent Skills

    Scientific Agent Skills

    A set of ready to use Agent Skills for research, science, engineering

    Scientific Agent Skills is an open-source collection of ready-to-use agent skills designed to turn AI coding assistants into stronger research, science, engineering, and analysis partners. It supports any AI agent compatible with the Agent Skills standard, including tools such as Cursor, Claude Code, Codex, and Gemini CLI. The repository includes 135 skills across scientific domains such as genomics, cheminformatics, clinical research, medical imaging, machine learning, physics, materials science, geospatial analysis, and scientific writing. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9
    C3

    C3

    The goal of CLAIMED is to enable low-code/no-code rapid prototyping

    C3 is an open-source framework designed to simplify the development and deployment of data science and machine learning workflows through reusable components and low-code development techniques. The framework focuses on enabling rapid prototyping while maintaining a path to production through automated CI/CD integration. CLAIMED provides a component-based architecture where data processing steps, models, and workflows can be packaged into reusable operators.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 99.99% Uptime for MySQL and PostgreSQL Databases Icon
    99.99% Uptime for MySQL and PostgreSQL Databases

    Sub-second maintenance. 2x read/write performance. Built-in vector search for AI apps.

    Cloud SQL Enterprise Plus delivers near-zero downtime with 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server.
    Try Free
  • 10
    PythonPark

    PythonPark

    Python open source project "The Road to Self-Study Programming"

    ...For someone self-teaching Python (or transitioning into coding/data science), the repository presents a one-stop “home base” of content, saving them from hunting scattered tutorials across the internet.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 12
    Jupyter Docker Stacks

    Jupyter Docker Stacks

    Ready-to-run Docker images containing Jupyter applications

    Jupyter Docker Stacks provides a curated set of ready-to-run Docker container images that bundle Jupyter applications with popular data science and computing tools, enabling users to quickly start working in a reproducible environment. These stacks support a range of use cases, from lightweight base notebook images to full featured environments that include scientific computing libraries, machine learning tools, and IDE-like notebook interfaces, all within Docker containers that run consistently across machines. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    omegaml

    omegaml

    MLOps simplified. From ML Pipeline ⇨ Data Product without the hassle

    omega|ml is the innovative Python-native MLOps platform that provides a scalable development and runtime environment for your Data Products. Works from laptop to cloud.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Book1_Python-For-Beginners

    Book1_Python-For-Beginners

    The Iris Book: Addition, Subtraction, Multiplication, and Division

    ...It integrates visual aids and annotated code examples to help learners understand not just how Python works but why certain patterns are used. The material is structured to support self-paced learning, making it suitable for students, career switchers, and hobbyists. Because the book is part of a larger data science pathway, it also prepares readers for later work in visualization and machine learning. Overall, it serves as an accessible on-ramp into Python within a broader analytical learning journey.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Book3_Elements-of-Mathematics

    Book3_Elements-of-Mathematics

    From Addition, Subtraction, Multiplication, and Division to ML

    Book3_Elements-of-Mathematics is an open learning resource in the Visualize-ML collection that introduces core mathematical foundations required for modern data science and AI. The repository presents topics such as algebra, calculus fundamentals, and mathematical reasoning using a highly visual and beginner-friendly approach. Its goal is to reduce the intimidation barrier often associated with formal mathematics by combining diagrams, structured explanations, and applied examples. The content is organized progressively so learners can build confidence before moving into more advanced quantitative subjects. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    TPOT

    TPOT

    A Python Automated Machine Learning tool that optimizes ML

    Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Robyn

    Robyn

    Experimental, AI/ML-powered and open sourced Marketing Mix Modeling

    Robyn is an open-source, AI/ML-powered Marketing Mix Modeling (MMM) toolkit developed by Meta Marketing Science under the “facebookexperimental” GitHub umbrella. Its goal is to democratize rigorous MMM: what traditionally required expert statisticians and expensive consulting becomes accessible to any company with data. Robyn takes in historical data (spends on different marketing channels, conversions, or revenue, and optional context or organic-media variables) and uses a combination of techniques, regularized regression (Ridge), time-series decomposition (trend, seasonality, holiday effects), and hyperparameter optimization (via evolutionary algorithms), to estimate the incremental impact of each marketing channel. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Kaggle Python Docker

    Kaggle Python Docker

    Kaggle Python docker image

    ...The project helps users understand, reproduce, and test against the same Python environment that powers Kaggle’s cloud notebooks. It includes a large curated package set for data science, machine learning, visualization, notebooks, and scientific computing. The images are useful for developers who want local or CI environments that closely match Kaggle’s runtime before submitting notebooks or sharing work. Its main value is making Kaggle’s managed notebook environment more transparent, reproducible, and portable through Docker.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MLE-bench

    MLE-bench

    AI multi-agent framework for automating data-driven R&D workflows

    RD-Agent is an open source AI framework designed to automate research and development workflows in data-driven domains. It uses large language models and multiple collaborating agents to simulate the typical cycle of research, experimentation, and improvement that human data scientists follow. It separates the process into two core phases: a research stage that proposes hypotheses and ideas, and a development stage that implements and evaluates them through code execution and experiments. By...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Ploomber

    Ploomber

    The fastest way to build data pipelines

    Ploomber is an open-source framework designed to simplify the development and deployment of data science and machine learning pipelines. It allows developers to transform exploratory data analysis workflows into production-ready pipelines without rewriting large portions of code. The system integrates with common development environments such as Jupyter Notebook, VS Code, and PyCharm, enabling data scientists to continue working with familiar tools while building scalable workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Anomaly Detection Learning Resources

    Anomaly Detection Learning Resources

    Anomaly detection related books, papers, videos, and toolboxes

    Anomaly Detection Learning Resources is a curated open-source repository that collects educational materials, tools, and academic references related to anomaly detection and outlier analysis in data science. The project serves as a centralized index for researchers and practitioners who want to explore algorithms, datasets, and publications associated with detecting unusual patterns in data. The repository organizes resources into structured categories such as books, tutorials, academic papers, datasets, benchmark frameworks, and open-source toolkits. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    machine learning tutorials

    machine learning tutorials

    machine learning tutorials (mainly in Python3)

    machine-learning is a continuously updated repository documenting the author’s learning journey through data science and machine learning topics using practical tutorials and experiments. The project presents educational notebooks that combine mathematical explanations with code implementations using Python’s scientific computing ecosystem. Topics covered include classical machine learning algorithms, deep learning models, reinforcement learning, model deployment, and time-series analysis. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    List of Free Learning Resources

    List of Free Learning Resources

    Freely available programming books

    ...Maintained by the community, it organizes materials by topic, language, and skill level, making it easy to discover learning resources. The repository includes content on software development, computer science, data science, and more. It is continuously updated with new resources contributed by developers worldwide. The project emphasizes accessibility and open education, providing high-quality materials without cost. It serves as a central hub for self-learners and professionals alike. Its structured organization makes it a widely used reference for learning programming.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    NVIDIA Merlin

    NVIDIA Merlin

    Library providing end-to-end GPU-accelerated recommender systems

    NVIDIA Merlin is an open-source library that accelerates recommender systems on NVIDIA GPUs. The library enables data scientists, machine learning engineers, and researchers to build high-performing recommenders at scale. Merlin includes tools to address common feature engineering, training, and inference challenges. Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, which is all accessible through easy-to-use APIs. For more information, see NVIDIA...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Kaggle CLI

    Kaggle CLI

    The official CLI to interact with Kaggle

    ...Its main value is turning Kaggle’s web-based data science platform into a scriptable developer workflow.
    Downloads: 2 This Week
    Last Update:
    See Project
Auth0 Logo