Showing 228 open source projects for "data"

View related business solutions
  • Ship AI Apps Faster with Vertex AI Icon
    Ship AI Apps Faster with Vertex AI

    Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

    Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
    Try Vertex AI Free
  • $300 in Free Credit for Your Google Cloud Projects Icon
    $300 in Free Credit for Your Google Cloud Projects

    Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

    Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    MLJAR Studio

    MLJAR Studio

    Python package for AutoML on Tabular Data with Feature Engineering

    ...All running locally on your machine. We are waiting for your feedback. The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameter tuning to find the best model. It is no black box, as you can see exactly how the ML pipeline is constructed (with a detailed Markdown report for each ML model).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Featuretools

    Featuretools

    An open source python library for automated feature engineering

    ...Featuretools automatically creates features from temporal and relational datasets. Featuretools uses DFS for automated feature engineering. You can combine your raw data with what you know about your data to build meaningful features for machine learning and predictive modeling. Featuretools provides APIs to ensure only valid data is used for calculations, keeping your feature vectors safe from common label leakage problems. You can specify prediction times row-by-row. Featuretools come with a library of low-level functions that can be stacked to create features. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Mlxtend

    Mlxtend

    A library of extension and helper modules for Python's data analysis

    Mlxtend (machine learning extensions) is a Python library of useful tools for day-to-day data science tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    tslearn

    tslearn

    The machine learning toolkit for time series analysis in Python

    ...The three dimensions correspond to the number of time series, the number of measurements per time series and the number of dimensions respectively (n_ts, max_sz, d). In order to get the data in the right format.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    MNE-Python

    MNE-Python

    Magnetoencephalography (MEG) and Electroencephalography EEG in Python

    Open-source Python package for exploring, visualizing, and analyzing human neurophysiological data. MNE-Python is an open-source Python package for exploring, visualizing, and analyzing human neurophysiological data such as MEG, EEG, sEEG, ECoG, and more. It includes modules for data input/output, preprocessing, visualization, source estimation, time-frequency analysis, connectivity analysis, machine learning, statistics, and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    fastai

    fastai

    Deep learning library

    ...It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Weights and Biases

    Weights and Biases

    Tool for visualizing and tracking your machine learning experiments

    ...Focus on the interesting ML. Spend less time manually tracking results in spreadsheets and text files. Capture dataset versions with W&B Artifacts to identify how changing data affects your resulting models. Reproduce any model, with saved code, hyperparameters, launch commands, input data, and resulting model weights. Set wandb.config once at the beginning of your script to save your hyperparameters, input settings (like dataset name or model type), and any other independent variables for your experiments. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Evidently

    Evidently

    Evaluate and monitor ML models from validation to production

    Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor ML models from validation to production. It works with tabular, text data and embeddings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    supabase-py

    supabase-py

    Python Client for Supabase. Query Postgres from Flask, Django

    Python Client for Supabase. Query Postgres from Flask, Django, FastAPI. Python user authentication, security policies, edge functions, file storage, and realtime data streaming. Good first issue.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Deploy Apps in Seconds with Cloud Run Icon
    Deploy Apps in Seconds with Cloud Run

    Host and run your applications without the need to manage infrastructure. Scales up from and down to zero automatically.

    Cloud Run is the fastest way to deploy containerized apps. Push your code in Go, Python, Node.js, Java, or any language and Cloud Run builds and deploys it automatically. Get fast autoscaling, pay only when your code runs, and skip the infrastructure headaches. Two million requests free per month. And new customers get $300 in free credit.
    Try Cloud Run Free
  • 10
    Darts

    Darts

    A python library for easy manipulation and forecasting of time series

    ...The models can all be used in the same way, using fit() and predict() functions, similar to scikit-learn. The library also makes it easy to backtest models, combine the predictions of several models, and take external data into account. Darts supports both univariate and multivariate time series and models. The ML-based models can be trained on potentially large datasets containing multiple time series, and some of the models offer a rich support for probabilistic forecasting. We recommend to first setup a clean Python environment for your project with at least Python 3.7 using your favorite tool (conda, venv, virtualenv with or without virtualenvwrapper).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    DeepVariant

    DeepVariant

    DeepVariant is an analysis pipeline that uses a deep neural networks

    DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. DeepVariant is a deep learning-based variant caller that takes aligned reads (in BAM or CRAM format), produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports the results in a standard VCF or gVCF file. DeepTrio is a deep learning-based trio variant caller built on top of DeepVariant. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    Lightly

    Lightly

    A python library for self-supervised learning on images

    ...We, at Lightly, are passionate engineers who want to make deep learning more efficient. That's why - together with our community - we want to popularize the use of self-supervised methods to understand and curate raw image data. Our solution can be applied before any data annotation step and the learned representations can be used to visualize and analyze datasets. This allows selecting the best core set of samples for model training through advanced filtering. We provide PyTorch, PyTorch Lightning and PyTorch Lightning distributed examples for each of the models to kickstart your project. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    openTSNE

    openTSNE

    Extensible, parallel implementations of t-SNE

    openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction algorithm for visualizing high-dimensional data sets. openTSNE incorporates the latest improvements to the t-SNE algorithm, including the ability to add new data points to existing embeddings [2], massive speed improvements [3] [4] [5], enabling t-SNE to scale to millions of data points, and various tricks to improve the global alignment of the resulting visualizations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    UMAP

    UMAP

    Uniform Manifold Approximation and Projection

    Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualization similarly to t-SNE, but also for general non-linear dimension reduction. It is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low-dimensional projection of the data that has the closest possible equivalent fuzzy topological structure. First of all UMAP is fast. It can handle large datasets and high dimensional data without too much difficulty, scaling beyond what most t-SNE packages can manage. This includes very high dimensional sparse datasets. UMAP has successfully been used directly on data with over a million dimensions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    TorchIO

    TorchIO

    Medical imaging toolkit for deep learning

    TorchIO is an open-source Python library for efficient loading, preprocessing, augmentation and patch-based sampling of 3D medical images in deep learning, following the design of PyTorch. It includes multiple intensity and spatial transforms for data augmentation and preprocessing. These transforms include typical computer vision operations such as random affine transformations and also domain-specific ones such as simulation of intensity artifacts due to MRI magnetic field inhomogeneity (bias) or k-space motion artifacts. TorchIO is a Python package containing a set of tools to efficiently read, preprocess, sample, augment, and write 3D medical images in deep learning applications written in PyTorch, including intensity and spatial transforms for data augmentation and preprocessing. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Interpretable machine learning

    Interpretable machine learning

    Book about interpretable machine learning

    ...As the programmer of an algorithm you want to know whether you can trust the learned model. Did it learn generalizable features? Or are there some odd artifacts in the training data which the algorithm picked up? This book will give an overview over techniques that can be used to make black boxes as transparent as possible and explain decisions. In the first chapter algorithms that produce simple, interpretable models are introduced together with instructions how to interpret the output. The later chapters focus on analyzing complex models and their decisions. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    mlforecast

    mlforecast

    Scalable machine learning for time series forecasting

    mlforecast is a time-series forecasting framework built around machine-learning models, designed to make forecasting both efficient and scalable. It lets you apply any regressor that follows the typical scikit-learn API, for example, gradient-boosted trees or linear models, to time-series data by automating much of the messy feature engineering and data preparation. Instead of writing custom code to build lagged features, rolling statistics, and date-based predictors, mlforecast generates those automatically based on a simple configuration. It supports multi-series forecasting, meaning you can train one model that forecasts many time series at once (common in retail, demand forecasting, etc.), rather than one model per series. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Materials Discovery: GNoME

    Materials Discovery: GNoME

    AI discovers 520000 stable inorganic crystal structures for research

    Materials Discovery (GNoME) is a large-scale research initiative by Google DeepMind focused on applying graph neural networks to accelerate the discovery of stable inorganic crystal materials. The project centers on Graph Networks for Materials Exploration (GNoME), a message-passing neural network architecture trained on density functional theory (DFT) data to predict material stability and energy formation. Using GNoME, DeepMind identified 381,000 new stable materials, later expanding the dataset to include over 520,000 materials within 1 meV/atom of the convex hull as of August 2024. The repository provides datasets, model definitions, and interactive Colabs for exploring these materials, computing decomposition energies, and visualizing chemical families. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Petastorm

    Petastorm

    Petastorm library enables single machine or distributed training

    ...On top of a Parquet schema, petastorm also stores higher-level schema information that makes multidimensional arrays into a native part of a petastorm dataset. Petastorm supports extensible data codecs. These enable a user to use one of the standard data compressions (jpeg, png) or implement her own.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Open Notebook

    Open Notebook

    An Open Source implementation of Notebook LM with more flexibility

    Open Notebook is an open-source, privacy-focused alternative to Google’s Notebook LM that gives users full control over their research and AI workflows. Designed to be self-hosted, it ensures complete data sovereignty by keeping your content local or within your own infrastructure. The platform supports 16+ AI providers—including OpenAI, Anthropic, Ollama, Google, and LM Studio—allowing flexible model choice and cost optimization. Open Notebook enables users to organize and analyze multi-modal content such as PDFs, videos, audio files, web pages, and Office documents. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 22
    Hamilton DAGWorks

    Hamilton DAGWorks

    Helps scientists define testable, modular, self-documenting dataflow

    Hamilton is a lightweight Python library for directed acyclic graphs (DAGs) of data transformations. Your DAG is portable; it runs anywhere Python runs, whether it's a script, notebook, Airflow pipeline, FastAPI server, etc. Your DAG is expressive; Hamilton has extensive features to define and modify the execution of a DAG (e.g., data validation, experiment tracking, remote execution). To create a DAG, write regular Python functions that specify their dependencies with their parameters. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    TensorFlow

    TensorFlow

    TensorFlow is an open source library for machine learning

    Originally developed by Google for internal use, TensorFlow is an open source platform for machine learning. Available across all common operating systems (desktop, server and mobile), TensorFlow provides stable APIs for Python and C as well as APIs that are not guaranteed to be backwards compatible or are 3rd party for a variety of other languages. The platform can be easily deployed on multiple CPUs, GPUs and Google's proprietary chip, the tensor processing unit (TPU). TensorFlow...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 24
    Gradio

    Gradio

    Create UIs for your machine learning model in Python in 3 minutes

    ...Hugging Face Spaces will host the interface on its servers and provide you with a link you can share. One of the best ways to share your machine learning model, API, or data science workflow with others is to create an interactive demo that allows your users or colleagues to try out the demo in their browsers.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 25
    Qlib

    Qlib

    Qlib is an AI-oriented quantitative investment platform

    Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment. With Qlib, you can easily try your ideas to create better Quant investment strategies. An increasing number of SOTA Quant research works/papers are released in Qlib. With Qlib, users can easily try their ideas to create better Quant investment strategies. At the module level, Qlib is a platform that consists of...
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB