Showing 493 open source projects for "data science"

View related business solutions
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 1
    Computer Science Flash Cards

    Computer Science Flash Cards

    Mini website for testing both general CS knowledge and enforce coding

    This repository collects concise flash cards that cover the core ideas of a traditional computer science curriculum with a focus on interview readiness. The cards distill topics like time and space complexity, classic data structures, algorithmic paradigms, operating systems, networking, and databases into short, testable prompts. They are designed for spaced-repetition style study so you can cycle frequently through fundamentals until recall feels automatic.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Computer Science courses video lectures

    Computer Science courses video lectures

    List of Computer Science courses with video lectures

    This repository is a curated list of full-length computer science video lecture series across many universities and MOOC platforms, helping learners assemble their own curriculum. The list spans foundational topics like algorithms, data structures, operating systems, computer networks, machine learning, and more, all delivered via lectures rather than just textual tutorials. The contributor guidelines encourage adding high-quality courses (not just casual tutorials) so the list remains academically oriented. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    NYC Taxi Data

    NYC Taxi Data

    Import public NYC taxi and for-hire vehicle (Uber, Lyft)

    ...It also contains example analyses—spatial and temporal visualizations like maps, time-series plots, and hotspot detection—highlighting insights such as patterns of demand, peak times, and geospatial distributions. The repository is often used as a benchmark dataset and example for teaching, benchmarking, and demonstration purposes in the data science and urban analytics communities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Metaflow

    Metaflow

    A framework for real-life data science

    Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    Positron

    Positron

    Positron, a next-generation data science IDE

    Positron is a next-generation integrated development environment (IDE) created by Posit PBC (formerly RStudio Inc) specifically tailored for data science workflows in Python, R, and multi-language ecosystems. It aims to unify exploratory data analysis, production code, and data-app authoring in a single environment so that data scientists move from “question → insight → application” without switching tools. Built on the open-source Code-OSS foundation, Positron provides a familiar coding experience along with specialized panes and tooling for variable inspection, data-frame viewing, plotting previews, and interactive consoles designed for analytical work. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    XGBoost

    XGBoost

    Scalable and Flexible Gradient Boosting

    ...XGBoost works by implementing machine learning algorithms under the Gradient Boosting framework. It also offers parallel tree boosting (GBDT, GBRT or GBM) that can quickly and accurately solve many data science problems. XGBoost can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Kedro

    Kedro

    A Python framework for creating reproducible, maintainable code

    Kedro is an open sourced Python framework for creating maintainable and modular data science code. Provides the scaffolding to build more complex data and machine-learning pipelines. In addition, there's a focus on spending less time on the tedious "plumbing" required to maintain data science code; this means that you have more time to solve new problems. Standardises team workflows; the modular structure of Kedro facilitates a higher level of collaboration when teams solve problems together. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    RStudio

    RStudio

    RStudio is an integrated development environment (IDE) for R

    RStudio is a powerful, full-featured integrated development environment (IDE) tailored primarily for the R programming language but increasingly supportive of other languages like Python and Julia. It brings together console, editor, plotting, workspace, history, and file-management panes into a unified interface, helping data scientists, statisticians, and analysts to work more productively. The IDE is cross-platform: there are desktop versions for Windows, macOS and Linux, as well as a...
    Downloads: 49 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 10
    Nuclio

    Nuclio

    High-Performance Serverless event and data processing platform

    Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    SIT742

    SIT742

    SIT742: Modern Data Science

    SIT742: Modern Data Science.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    DearPyGui

    DearPyGui

    Graphical User Interface Toolkit for Python with minimal dependencies

    ...DPG is well suited for creating simple user interfaces as well as developing complex and demanding graphical interfaces. DPG offers a solid framework for developing scientific, engineering, gaming, data science and other applications that require fast and interactive interfaces. The Tutorials will provide a great overview and links to each topic in the API Reference for more detailed reading. Complete theme and style control. GPU-based rendering and efficient C/C++ code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    tsfresh

    tsfresh

    Automatic extraction of relevant features from time series

    ...With tsfresh this process is automated and all your features can be calculated automatically. Further tsfresh is compatible with pythons pandas and scikit-learn APIs, two important packages for Data Science endeavours in python. The extracted features can be used to describe or cluster time series based on the extracted characteristics. Further, they can be used to build models that perform classification/regression tasks on the time series. Often the features give new insights into time series and their dynamics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    ...SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit, and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of data sources. SynapseML also brings new networking capabilities to the Spark Ecosystem. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    DocArray

    DocArray

    The data structure for multimodal data

    ...The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. Data science powerhouse: greatly accelerate data scientists’ work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU. Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    SageMaker Training Toolkit

    SageMaker Training Toolkit

    Train machine learning models within Docker containers

    Train machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and reliable training process. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    RStudio Cheatsheets

    RStudio Cheatsheets

    Curated collection of official cheat sheets for data science tools

    The cheatsheets repository from RStudio is a curated collection of official cheat sheets for R, RStudio, the tidyverse, Shiny, and related data science tools. Each cheat sheet is a single (or double) page PDF that condenses important syntax, functions, workflows, and best practices into a visually organized format ideal for quick reference. The repository contains source files (R Markdown or LaTeX) that generate the cheat sheets, version history, and metadata (title, author, description) for each. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    PSLab Android App

    PSLab Android App

    PSLab Android App

    Repository for the PSLab Android App for performing experiments with the Pocket Science Lab open-hardware platform. This repository holds the Android App for performing experiments with PSLab. PSLab is a tiny pocket science lab that provides an array of equipment for doing science and engineering experiments. It can function like an oscilloscope, waveform generator, frequency counter, programmable voltage and current source and also as a data logger. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    dplyr

    dplyr

    dplyr: A grammar of data manipulation

    dplyr is an R package that provides a consistent and intuitive grammar for data manipulation, enabling users to filter, arrange, summarize, and transform data efficiently. Part of the tidyverse ecosystem, dplyr simplifies complex data operations through a clear and readable syntax, whether working with data frames, tibbles, or databases. It is widely used in data science and statistical analysis workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    ...With Spark Streaming (microbatches) and Structured Streaming, it delivers low-latency event processing suitable for real-time analytics. The built-in MLlib library provides scalable machine learning algorithms, while GraphX enables graph computations integrated with data pipelines. Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    GeoStats.jl

    GeoStats.jl

    An extensible framework for geospatial data science

    GeoStats.jl is a Julia framework for geospatial data science and geostatistical modeling. It’s fully implemented in Julia and designed to provide an extensible, high-performance stack that handles spatial domains, interpolation, simulation, learning, and visualization. The package is modular: it breaks out geometry, spatial domains, transforms, variograms, covariance models, and modeling into subpackages (e.g., GeoStatsBase, GeoStatsModels, GeoStatsTransforms).
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    XState

    XState

    State machines and statecharts for the modern web

    JavaScript and TypeScript finite state machines and statecharts for the modern web. Statecharts are a formalism for modeling stateful, reactive systems. This is useful for declaratively describing the behavior of your application, from the individual components to the overall application logic. XState is a library for creating, interpreting, and executing finite state machines and statecharts, as well as managing invocations of those machines as actors. The following fundamental computer...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Jupyter Docker Stacks

    Jupyter Docker Stacks

    Ready-to-run Docker images containing Jupyter applications

    Jupyter Docker Stacks provides a curated set of ready-to-run Docker container images that bundle Jupyter applications with popular data science and computing tools, enabling users to quickly start working in a reproducible environment. These stacks support a range of use cases, from lightweight base notebook images to full featured environments that include scientific computing libraries, machine learning tools, and IDE-like notebook interfaces, all within Docker containers that run consistently across machines. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    InterviewGuide

    InterviewGuide

    Repository that collects extensive computer science

    ...The repository contains curated algorithm and data structure explanations, collections of interview questions, high-frequency topics, and practical problem-solving guides to help users brush up on skills that are often tested in technical interviews.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Coder

    Coder

    Provision remote development environments via Terraform

    Onboard developers to fully configured cloud development environments with Coder, the only open-source platform you can self-host and manage for complete security and control. Coder is an open-source cloud development environment (CDE) that you host in your cloud or on-premises. With Coder, you can deploy environments that provide the infrastructure, IDEs, and tools your developers need. Upgrade to Coder Premium to gain enhanced security, governance, and observability for your platform teams.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo