Showing 50 open source projects for "training"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    SageMaker Training Toolkit

    SageMaker Training Toolkit

    Train machine learning models within Docker containers

    ...Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and reliable training process. The SageMaker Training Toolkit can be easily added to any Docker container, making it compatible with SageMaker for training models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    NVIDIA Merlin

    NVIDIA Merlin

    Library providing end-to-end GPU-accelerated recommender systems

    ...The library enables data scientists, machine learning engineers, and researchers to build high-performing recommenders at scale. Merlin includes tools to address common feature engineering, training, and inference challenges. Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, which is all accessible through easy-to-use APIs. For more information, see NVIDIA Merlin on the NVIDIA developer website. Transform data (ETL) for preprocessing and engineering features. Accelerate your existing training pipelines in TensorFlow, PyTorch, or FastAI by leveraging optimized, custom-built data loaders. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    EvoTrees.jl

    EvoTrees.jl

    Boosted trees in Julia

    A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram-based algorithms with support for multiple loss functions, including various regressions, multi-classification and Gaussian max likelihood.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    JuliaConnectoR

    JuliaConnectoR

    A functionally oriented interface for calling Julia from R

    This R-package provides a functionally oriented interface between R and Julia. The goal is to call functions from Julia packages directly as R functions. Julia functions imported via the JuliaConnectoR can accept and return R variables. It is also possible to pass R functions as arguments in place of Julia functions, which allows callbacks from Julia to R. From a technical perspective, R data structures are serialized with an optimized custom streaming format, sent to a (local) Julia TCP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    ...It supports importing and exporting annotations and images across a wide variety of standards like COCO, PASCAL VOC, YOLO, ImageNet, Cityscapes, and many more, enabling easy integration with different training pipelines and tools. Datumaro makes it easy to merge datasets, split them into training/validation/test subsets, filter or transform annotations, and validate annotation quality — all while preserving metadata and supporting detailed statistics. It’s especially useful when you’re dealing with heterogeneous data sources or need to prepare complex datasets for machine learning workflows, freeing you from writing custom scripts for every format conversion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    ReservoirComputing.jl

    ReservoirComputing.jl

    Reservoir computing utilities for scientific machine learning (SciML)

    ReservoirComputing.jl provides an efficient, modular and easy-to-use implementation of Reservoir Computing models such as Echo State Networks (ESNs). For information on using this package please refer to the stable documentation. Use the in-development documentation to take a look at not-yet-released features.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    NeuralOperators.jl

    NeuralOperators.jl

    DeepONets, Neural Operators, Physics-Informed Neural Ops in Julia

    ...It learns an operator, which is a mapping between infinite-dimensional function spaces. It can be used to resolve partial differential equations (PDE). Instead of solving by finite element method, a PDE problem can be resolved by training a neural network to learn an operator mapping from infinite-dimensional space (u, t) to infinite-dimensional space f(u, t). Neural operator learns a continuous function between two continuous function spaces. The kernel can be trained on different geometry, which is learned from a graph. Fourier neural operator learns a neural operator with Dirichlet kernel to form a Fourier transformation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    XGBoost

    XGBoost

    Scalable and Flexible Gradient Boosting

    XGBoost is an optimized distributed gradient boosting library, designed to be scalable, flexible, portable and highly efficient. It supports regression, classification, ranking and user defined objectives, and runs on all major operating systems and cloud platforms. XGBoost works by implementing machine learning algorithms under the Gradient Boosting framework. It also offers parallel tree boosting (GBDT, GBRT or GBM) that can quickly and accurately solve many data science problems....
    Downloads: 6 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10
    DataChain

    DataChain

    AI-data warehouse to enrich, transform and analyze unstructured data

    Datachain enables multimodal API calls and local AI inferences to run in parallel over many samples as chained operations. The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain is especially helpful if batch operations can be optimized – for instance, when synchronous API calls can be parallelized or where an LLM API offers batch processing.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Surrogates.jl

    Surrogates.jl

    Surrogate modeling and optimization for scientific machine learning

    ...It may be the case we need to solve a PDE for each point or use advanced numerical linear algebra machinery, which is usually costly. The idea is then to develop a surrogate model g which approximates f by training on previous data collected from evaluations of f.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    gramm

    gramm

    Gramm is a complete data visualization toolbox for Matlab

    ...Gramm has been used in many publications from varied fields and is particularily suited for neuroscience, from human movement psychophysics (Morel et al. 2017), to electrophysiology (Morel et al. 2016; Ferrea et al. 2017), human functional imaging (Wan et al. 2017) and animal training (Berger et al. 2017).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    iRODS

    iRODS

    Open Source Data Management Software

    ...The development infrastructure supports exhaustive testing on supported platforms; plugin support for microservices, storage resources, authentication mechanisms, network protocols, rule engines, new API endpoints, and databases; and extensive documentation, training, and support services.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Blue Whale Configuration Platform

    Blue Whale Configuration Platform

    Blue Whale smart cloud configuration platform

    ...From configuration management to job execution, task scheduling and monitoring self-healing, and then through operation and maintenance big data analysis to assist operational decision-making, it covers the full-cycle assurance management of business operations in a comprehensive manner. The open PaaS has a powerful development framework and scheduling engine, as well as a complete operation and maintenance development training system, which helps the rapid transformation and upgrading of operation and maintenance. Through the Blue Whale intelligent cloud system, it can help enterprises quickly realize the automation of basic operation and maintenance services, thereby accelerating the transformation of DevOps, realizing a tool culture, and maximizing operational efficiency.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Gorse Recommender System Engine

    Gorse Recommender System Engine

    An open source recommender system service written in Go

    ...Recommend items from Popular, latest, user-based, item-based and collaborative filtering. Search the best recommendation model automatically in the background. Support horizontal scaling in the recommendation stage after single node training. Support Redis, MySQL, Postgres, MongoDB, and ClickHouse as its storage backend. Expose RESTful APIs for data CRUD and recommendation requests. Analyze online recommendation performance from recently inserted feedback. Provide GUI for data management, system monitoring, and cluster status checking. Gorse is an open-source recommendation system written in Go. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Automated Tool for Optimized Modelling

    Automated Tool for Optimized Modelling

    Automated Tool for Optimized Modelling

    During the exploration phase of a machine learning project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc. Testing multiple pipelines requires many lines of code, and writing it all in the same notebook often makes it long and cluttered. On the other hand, using multiple notebooks makes it harder to compare the results and to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Fondant

    Fondant

    Production-ready data processing made easy and shareable

    Fondant is a modular, pipeline-based framework designed to simplify the preparation of large-scale datasets for training machine learning models, especially foundation models. It offers an end-to-end system for ingesting raw data, applying transformations, filtering, and formatting outputs—all while remaining scalable and traceable. Fondant is designed with reproducibility in mind and supports containerized steps using Docker, making it easy to share and reuse data processing components. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Encord Active

    Encord Active

    The toolkit to test, validate, and evaluate your models and surface

    Encord Active is an open-source toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling to supercharge model performance. Encord Active has been designed as a all-in-one open source toolkit for improving your data quality and model performance. Use the intuitive UI to explore your data or access all the functionalities programmatically. Discover errors, outliers, and edge-cases within your data - all in one open source...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    mapcn

    mapcn

    Beautiful map components, 100% Free, Zero config, one command setup

    mapcn is a research-oriented project centered on mapping continuous control in reinforcement learning to structured policies using neural networks. It explores how high-dimensional action spaces can be decomposed into structured primitives that can be learned, composed, and reused across different tasks. The core idea is to enable agents to generalize learned behavior by representing continuous control policies in a compact, interpretable form that preserves smoothness and controllability....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    whylogs

    whylogs

    The open standard for data logging

    whylogs is an open-source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to track changes in their dataset Create data constraints to know whether their data looks the way it should. Quickly visualize key summary statistics about their datasets. whylogs profiles are the core of the whylogs library. They capture key statistical properties of data, such as the distribution (far beyond...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Gretel Synthetics

    Gretel Synthetics

    Synthetic data generators for structured and unstructured text

    ...Synthesize and transform multiple tables or entire relational databases. Mitigate GDPR and CCPA risks, and promote safe data access. Accelerate CI/CD workflows, performance testing, and staging. Augment AI training data, including minority classes and unique edge cases. Amaze prospects with personalized product experiences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    JuiceFS

    JuiceFS

    JuiceFS is a distributed POSIX file system built on top of Redis

    ...Whether it's a public cloud, private cloud, or hybrid cloud, JuiceFS is available on any cloud of your choice and delivers flexibility, availability, scalability and strong consistency for your data-intensive applications. Purposely built to serve big data scenarios such as self-driving model training, recommendation engine, and Next-generation Gene Sequencing, JuiceFS specializes in high performance and easier management of tens of billion of files management. We bring JuiceFS to developers with the hope that it will be easy to use, reliable, high-performance, and solve all your file storage problems in a cloud environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Recommenders

    Recommenders

    Best practices on recommendation systems

    ...Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks. Independent or incubating algorithms and utilities are candidates for the contrib folder. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Parallel and Distributed Process System

    Parallel and Distributed Process System

    OmniSim simulates parallel and distributed processing systems

    Parallel and Distributed Process OmniSim Computational Neuroscience: Large-scale neural population dynamics, brain-inspired computing architectures, and neuro-symbolic AI systems 🧬 Scientific Overview PDP-OmniSim is an advanced computational framework for simulating parallel and distributed processing systems, with cutting-edge applications in computational neuroscience, distributed computing, and complex systems modeling. The framework provides researchers with robust tools for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Sweetviz

    Sweetviz

    Visualize and compare datasets, target values and associations

    ...Output is a fully self-contained HTML application. The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Shows how a target value (e.g. "Survived" in the Titanic dataset) relates to other features. Sweetviz integrates associations for numerical (Pearson's correlation), categorical (uncertainty coefficient) and categorical-numerical (correlation ratio) datatypes seamlessly, to provide maximum information for all data types. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB