Open Source Machine Learning Software - Page 8

Sort By:

Machine Learning Software

View 446 business solutions

Machine Learning Clear Filters

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

Lucid

A collection of infrastructure and tools for research

Lucid is a collection of infrastructure and tools for research in neural network interpretability. Lucid is research code, not production code. We provide no guarantee it will work for your use case. Lucid is maintained by volunteers who are unable to provide significant technical support. Start visualizing neural networks with no setup. The following notebooks run right from your browser, thanks to Collaboratory. It's a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud. You can run the notebooks on your local machine, too. Clone the repository and find them in the notebooks subfolder. You will need to run a local instance of the Jupyter notebook environment to execute them. Feature visualization answers questions about what a network, or parts of a network, are looking for by generating examples.

Downloads: 5 This Week

Last Update: 2021-12-15
See Project
2

MIT Deep Learning Book

MIT Deep Learning Book in PDF format by Ian Goodfellow

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville. An MIT Press book Ian Goodfellow and Yoshua Bengio and Aaron Courville. Written by three experts in the field, Deep Learning is the only comprehensive book on the subject. This is not available as PDF download. So, I have taken the prints of the HTML content and bound them into a flawless PDF version of the book, as suggested by the website itself. Printing seems to work best printing directly from the browser, using Chrome. Other browsers do not work as well.

Downloads: 5 This Week

Last Update: 2022-07-29
See Project
3

MONAI

AI Toolkit for Healthcare Imaging

The MONAI framework is the open-source foundation being created by Project MONAI. MONAI is a freely available, community-supported, PyTorch-based framework for deep learning in healthcare imaging. It provides domain-optimized foundational capabilities for developing healthcare imaging training workflows in a native PyTorch paradigm. Project MONAI also includes MONAI Label, an intelligent open source image labeling and learning tool that helps researchers and clinicians collaborate, create annotated datasets, and build AI models in a standardized MONAI paradigm. MONAI is an open-source project. It is built on top of PyTorch and is released under the Apache 2.0 license. Aiming to capture best practices of AI development for healthcare researchers, with an immediate focus on medical imaging. Providing user-comprehensible error messages and easy to program API interfaces. Provides reproducibility of research experiments for comparisons against state-of-the-art implementations.

Downloads: 5 This Week

Last Update: 2026-01-27
See Project
4

Machine Learning PyTorch Scikit-Learn

Code Repository for Machine Learning with PyTorch and Scikit-Learn

Initially, this project started as the 4th edition of Python Machine Learning. However, after putting so much passion and hard work into the changes and new topics, we thought it deserved a new title. So, what’s new? There are many contents and additions, including the switch from TensorFlow to PyTorch, new chapters on graph neural networks and transformers, a new section on gradient boosting, and many more that I will detail in a separate blog post. For those who are interested in knowing what this book covers in general, I’d describe it as a comprehensive resource on the fundamental concepts of machine learning and deep learning. The first half of the book introduces readers to machine learning using scikit-learn, the defacto approach for working with tabular datasets. Then, the second half of this book focuses on deep learning, including applications to natural language processing and computer vision.

Downloads: 5 This Week

Last Update: 2022-08-22
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
5

Machine Learning cheatsheets Stanford

VIP cheatsheets for Stanford's CS 229 Machine Learning

stanford-cs-229-machine-learning is an open-source educational repository that provides illustrated cheat sheets summarizing the key concepts taught in Stanford University’s CS229 machine learning course. The project compiles concise explanations of important topics in machine learning and presents them in an accessible format that helps learners review complex ideas quickly. The repository includes summaries covering areas such as supervised learning, unsupervised learning, deep learning, and optimization techniques. In addition to machine learning algorithms, it also contains refresher materials on mathematical prerequisites including probability theory, statistics, linear algebra, and calculus. These cheat sheets are designed to serve as quick reference guides that students can use while studying or reviewing machine learning material.

Downloads: 5 This Week

Last Update: 2026-03-10
See Project
6

Mars Framework

Mars is a tensor-based unified framework for large-scale data

Mars is a distributed computing framework designed to scale scientific computing and data science workloads across large clusters while preserving the familiar programming interfaces of common Python libraries. The project provides a tensor-based execution model that extends the capabilities of tools such as NumPy, pandas, and scikit-learn so that large datasets can be processed in parallel without rewriting code for distributed environments. Its architecture automatically divides large computational tasks into smaller chunks that can be executed across multiple nodes in a cluster, allowing complex analytics, machine learning workflows, and data transformations to run efficiently at scale. Mars is particularly useful for workloads that exceed the memory capacity of a single machine or require high levels of parallel processing.

Downloads: 5 This Week

Last Update: 2026-03-11
See Project
7

MiniSom

MiniSom is a minimalistic implementation of the Self Organizing Maps

MiniSom is a minimalistic and Numpy-based implementation of the Self Organizing Maps (SOM). SOM is a type of Artificial Neural Network able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display. Minisom is designed to allow researchers to easily build on top of it and to give students the ability to quickly grasp its details. The project initially aimed for a minimalistic implementation of the Self-Organizing Map (SOM) algorithm, focusing on simplicity in features, dependencies, and code style. Although it has expanded in terms of features, it remains minimalistic by relying only on the numpy library and emphasizing vectorization in coding style.

Downloads: 5 This Week

Last Update: 2026-01-14
See Project
8

NSFWDetector

A NSFW detector with CoreML

NSFWDetector is a small (17 kB) CoreML Model to scan images for nudity. It was trained using CreateML to distinguish between porn/nudity and appropriate pictures. With the main focus on distinguishing between Instagram model-like pictures and porn.

Downloads: 5 This Week

Last Update: 2024-08-16
See Project
9

NVIDIA PhysicsNeMo

Open-source deep-learning framework for building and training

NVIDIA PhysicsNeMo is an open-source deep learning framework designed for building artificial intelligence models that incorporate physical laws and scientific knowledge into machine learning workflows. The framework focuses on the emerging field of physics-informed machine learning, where neural networks are used alongside physical equations to model complex scientific systems. PhysicsNeMo provides modular Python components that allow developers to create scalable training and inference pipelines for models that combine data-driven learning with physics-based constraints. It is built on top of the PyTorch ecosystem and integrates with GPU-accelerated computing environments to handle computationally demanding simulations and datasets. The framework supports a wide range of scientific applications, including computational fluid dynamics, climate modeling, weather prediction, and engineering simulations.

Downloads: 5 This Week

Last Update: 2026-03-12
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
10

NannyML

Detecting silent model failure. NannyML estimates performance

NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, and interactive visualizations, is completely model-agnostic, and currently supports all tabular classification use cases. NannyML closes the loop with performance monitoring and post deployment data science, empowering data scientist to quickly understand and automatically detect silent model failure. By using NannyML, data scientists can finally maintain complete visibility and trust in their deployed machine learning models. When the actual outcome of your deployed prediction models is delayed, or even when post-deployment target labels are completely absent, you can use NannyML's CBPE-algorithm to estimate model performance.

Downloads: 5 This Week

Last Update: 2025-07-12
See Project
11

NeuralPDE.jl

Physics-Informed Neural Networks (PINN) Solvers

NeuralPDE.jl is a Julia library for solving partial differential equations (PDEs) using physics-informed neural networks and scientific machine learning. Built on top of the SciML ecosystem, it provides a flexible and composable interface for defining PDEs and training neural networks to approximate their solutions. NeuralPDE.jl enables hybrid modeling, data-driven discovery, and fast PDE solvers in high dimensions, making it suitable for scientific research and engineering applications.

Downloads: 5 This Week

Last Update: 2026-04-06
See Project
12

OpenBB

Investment Research for Everyone, Everywhere

Customize and speed up your analysis, bring your own data, and create instant reports to gain a competitive edge. Whether it’s a CSV file, a private endpoint, an RSS feed, or even embed an SEC filing directly. Chat with financial data using large language models. Don’t waste time reading, create summaries in seconds and ask how that impacts investments. Create your dashboard with your favorite widgets. Create charts directly from raw data in seconds. Create charts directly from raw data in seconds. Customize your dashboards to build your dream terminal, integrate with your private datasets and bring your own fine-tuned AI copilots.

Downloads: 5 This Week

Last Update: 2026-03-09
See Project
13

OpenCLIP

An open source implementation of CLIP

The goal of this repository is to enable training models with contrastive image-text supervision and to investigate their properties such as robustness to distribution shift. Our starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset. Specifically, a ResNet-50 model trained with our codebase on OpenAI's 15 million image subset of YFCC achieves 32.7% top-1 accuracy on ImageNet. OpenAI's CLIP model reaches 31.3% when trained on the same subset of YFCC. For ease of experimentation, we also provide code for training on the 3 million images in the Conceptual Captions dataset, where a ResNet-50x4 trained with our codebase reaches 22.2% top-1 ImageNet accuracy. This codebase is work in progress, and we invite all to contribute in making it more accessible and useful. In the future, we plan to add support for TPU training and release larger models. We hope this codebase facilitates and promotes further research.

Downloads: 5 This Week

Last Update: 2026-02-27
See Project
14

OpenMLDB

OpenMLDB is an open-source machine learning database

OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference. OpenMLDB is an open-source machine learning database that is committed to solving the data and feature challenges. OpenMLDB has been deployed in hundreds of real-world enterprise applications. It prioritizes the capability of feature engineering using SQL for open-source, which offers a feature platform enabling consistent features for training and inference. Real-time features are essential for many machine learning applications, such as real-time personalized recommendations and risk analytics. However, a feature engineering script developed by data scientists (Python scripts in most cases) cannot be directly deployed into production for online inference because it usually cannot meet the engineering requirements, such as low latency, high throughput and high availability.

Downloads: 5 This Week

Last Update: 2025-02-21
See Project
15

PromptSource

Toolkit for creating, sharing and using natural language prompts

PromptSource is a toolkit for creating, sharing and using natural language prompts. Recent work has shown that large language models exhibit the ability to perform reasonable zero-shot generalization to new tasks. For instance, GPT-3 demonstrated that large language models have strong zero- and few-shot abilities. FLAN and T0 then demonstrated that pre-trained language models fine-tuned in a massively multitask fashion yield even stronger zero-shot performance. A common denominator in these works is the use of prompts which has gained interest among NLP researchers and engineers. This emphasizes the need for new tools to create, share and use natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. PromptSource contains a growing collection of prompts (which we call P3: Public Pool of Prompts). As of January 20, 2022, there are ~2'000 English prompts for 170+ English datasets in P3.

Downloads: 5 This Week

Last Update: 2024-08-07
See Project
16

SageMaker Training Toolkit

Train machine learning models within Docker containers

Train machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and reliable training process. The SageMaker Training Toolkit can be easily added to any Docker container, making it compatible with SageMaker for training models. If you use a prebuilt SageMaker Docker image for training, this library may already be included. Write a training script (eg. train.py). Define a container with a Dockerfile that includes the training script and any dependencies.

Downloads: 5 This Week

Last Update: 2025-09-22
See Project
17

Semantic Segmentation Editor

Web labeling tool for bitmap images and point clouds

A web-based labeling tool for creating AI training data sets (2D and 3D). The tool has been developed in the context of autonomous driving research. It supports images (.jpg or .png) and point clouds (.pcd). It is a Meteor app developed with React, Paper.js, and three.js.

Downloads: 5 This Week

Last Update: 2024-08-14
See Project
18

Stable Baselines3

PyTorch version of Stable Baselines

Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of Stable Baselines. You can read a detailed presentation of Stable Baselines3 in the v1.0 blog post or our JMLR paper. These algorithms will make it easier for the research community and industry to replicate, refine, and identify new ideas, and will create good baselines to build projects on top of. We expect these tools will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones. We also hope that the simplicity of these tools will allow beginners to experiment with a more advanced toolset, without being buried in implementation details.

Downloads: 5 This Week

Last Update: 2026-04-01
See Project
19

TTS

Deep learning for text to speech

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed, and quality. TTS comes with pre-trained models, tools for measuring dataset quality, and is already used in 20+ languages for products and research projects. Released models in PyTorch, Tensorflow and TFLite. Tools to curate Text2Speech datasets underdataset_analysis. Demo server for model testing. Notebooks for extensive model benchmarking. Modular (but not too much) code base enabling easy testing for new ideas. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings efficiently. Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN). If you are only interested in synthesizing speech with the released TTS models, installing from PyPI is the easiest option.

Downloads: 5 This Week

Last Update: 2021-10-18
See Project
20

TensorFlow Serving

Serving system for machine learning models

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It deals with the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data. The easiest and most straight-forward way of using TensorFlow Serving is with Docker images. We highly recommend this route unless you have specific needs that are not addressed by running in a container. In order to serve a Tensorflow model, simply export a SavedModel from your Tensorflow program. SavedModel is a language-neutral, recoverable, hermetic serialization format that enables higher-level systems and tools to produce, consume, and transform TensorFlow models.

Downloads: 5 This Week

Last Update: 2025-08-19
See Project
21

Tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

Fast State-of-the-art tokenizers, optimized for both research and production. Tokenizers provides an implementation of today’s most used tokenizers, with a focus on performance and versatility. These tokenizers are also used in Transformers. Train new vocabularies and tokenize, using today’s most used tokenizers. Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server’s CPU. Easy to use, but also extremely versatile. Designed for both research and production. Full alignment tracking. Even with destructive normalization, it’s always possible to get the part of the original sentence that corresponds to any token. Does all the pre-processing: Truncation, Padding, add the special tokens your model needs.

Downloads: 5 This Week

Last Update: 2025-12-02
See Project
22

Transformer Reinforcement Learning X

A repo for distributed training of language models with Reinforcement

trlX is a distributed training framework designed from the ground up to focus on fine-tuning large language models with reinforcement learning using either a provided reward function or a reward-labeled dataset. Training support for Hugging Face models is provided by Accelerate-backed trainers, allowing users to fine-tune causal and T5-based language models of up to 20B parameters, such as facebook/opt-6.7b, EleutherAI/gpt-neox-20b, and google/flan-t5-xxl. For models beyond 20B parameters, trlX provides NVIDIA NeMo-backed trainers that leverage efficient parallelism techniques to scale effectively.

Downloads: 5 This Week

Last Update: 2024-08-03
See Project
23

Vearch

A distributed system for embedding-based vector retrieval

Vearch is the vector search infrastructure for deep learning and AI applications. Vearch is a distributed vector storage and retrieval system which can be easily extended to billions scale. Vearch implements a high-performance, lockless real-time vector indexing subsystem that utilizes various optimization techniques to support millisecond vector update and retrieval. End-to-end one-click deployment. Through the module of the plugin, a complete default visual search system can be deployed just with one click. Otherwise, you can easily customize your own image, video, or text feature extraction algorithm plugin. This GIF provides a clear demonstration of the project vearch usage and its internal structure. The use of vearch is mainly divided into three steps. Firstly, create DB and Space, then import your data, and finally, you can search on your own dataset.

Downloads: 5 This Week

Last Update: 2026-02-04
See Project
24

Zygote

21st century AD

Zygote provides source-to-source automatic differentiation (AD) in Julia, and is the next-gen AD system for the Flux differentiable programming framework. For more details and benchmarks of Zygote's technique, see our paper. You may want to check out Flux for more interesting examples of Zygote usage; the documentation here focuses on internals and advanced AD usage.

Downloads: 5 This Week

Last Update: 2025-06-14
See Project
25

audioFlux

A library for audio and music analysis, feature extraction

A library for audio and music analysis, and feature extraction. Can be used for deep learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations. It can be provided to deep learning networks for training and is used to study various tasks in the audio field such as Classification, Separation, Music Information Retrieval(MIR) ASR, etc.

Downloads: 5 This Week

Last Update: 2024-08-09
See Project