inference free download

Showing 59 open source projects for "inference"

View related business solutions

Software Development Python Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

Mistral Inference

Official inference library for Mistral models

Open and portable generative AI for devs and businesses. We release open-weight models for everyone to customize and deploy where they want it. Our super-efficient model Mistral Nemo is available under Apache 2.0, while Mistral Large 2 is available through both a free non-commercial license, and a commercial license.

Downloads: 9 This Week

Last Update: 2025-03-20
See Project
2

SageMaker Hugging Face Inference Toolkit

Library for serving Transformers models on Amazon SageMaker

SageMaker Hugging Face Inference Toolkit is an open-source library for serving Transformers models on Amazon SageMaker. This library provides default pre-processing, predict and postprocessing for certain Transformers models and tasks. It utilizes the SageMaker Inference Toolkit for starting up the model server, which is responsible for handling inference requests.

Downloads: 5 This Week

Last Update: 2025-04-23
See Project
3

KServe

Standardized Serverless ML Inference Platform on Kubernetes

KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and...

Downloads: 10 This Week

Last Update: 2026-03-13
See Project
4

AIMET

AIMET is a library that provides advanced quantization and compression

Qualcomm Innovation Center (QuIC) is at the forefront of enabling low-power inference at the edge through its pioneering model-efficiency research. QuIC has a mission to help migrate the ecosystem toward fixed-point inference. With this goal, QuIC presents the AI Model Efficiency Toolkit (AIMET) - a library that provides advanced quantization and compression techniques for trained neural network models. AIMET enables neural networks to run more efficiently on fixed-point AI hardware accelerators. ...

Downloads: 23 This Week

Last Update: 2026-04-06
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
5

bitnet.cpp

Official inference framework for 1-bit LLMs

bitnet.cpp is the official open-source inference framework and ecosystem designed to enable ultra-efficient execution of 1-bit large language models (LLMs), which quantize most model parameters to ternary values (-1, 0, +1) while maintaining competitive performance with full-precision counterparts. At its core is bitnet.cpp, a highly optimized C++ backend that supports fast, low-memory inference on both CPUs and GPUs, enabling models such as BitNet b1.58 to run without requiring enormous compute infrastructure. ...

Downloads: 13 This Week

Last Update: 2026-03-10
See Project
6

BentoML

Unified Model Serving Framework

...Standard .bento format for packaging code, models and dependencies for easy versioning and deployment. Integrate with any training pipeline or ML experimentation platform. Parallelize compute-intense model inference workloads to scale separately from the serving logic. Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.

Downloads: 2 This Week

Last Update: 2026-04-02
See Project
7

NNCF

Neural Network Compression Framework for enhanced OpenVINO

NNCF (Neural Network Compression Framework) is an optimization toolkit for deep learning models, designed to apply quantization, pruning, and other techniques to improve inference efficiency.

Downloads: 8 This Week

Last Update: 6 days ago
See Project
8

AWS Deep Learning Containers

A set of Docker images for training and serving models in TensorFlow

...Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. This project is licensed under the Apache-2.0 License. Ensure you have access to an AWS account i.e. setup your environment such that awscli can access your account via either an IAM user or an IAM role.

Downloads: 8 This Week

Last Update: 16 hours ago
See Project
9

AWS Neuron

Powering Amazon custom machine learning chips

AWS Neuron is a software development kit (SDK) for running machine learning inference using AWS Inferentia chips. It consists of a compiler, run-time, and profiling tools that enable developers to run high-performance and low latency inference using AWS Inferentia-based Amazon EC2 Inf1 instances. Using Neuron developers can easily train their machine learning models on any popular framework such as TensorFlow, PyTorch, and MXNet, and run it optimally on Amazon EC2 Inf1 instances. ...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

DocTR

Library for OCR-related tasks powered by Deep Learning

DocTR provides an easy and powerful way to extract valuable information from your documents. Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document...

Downloads: 9 This Week

Last Update: 2026-02-04
See Project
11

AudioCraft

Audiocraft is a library for audio processing and generation

AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. ...

Downloads: 9 This Week

Last Update: 2025-10-13
See Project
12

PyMC3

Probabilistic programming in Python

...A Gaussian process (GP) can be used as a prior probability distribution whose support is over the space of continuous functions. PyMC3 provides rich support for defining and using GPs. Variational inference saves computational cost by turning a problem of integration into one of optimization. PyMC3's variational API supports a number of cutting edge algorithms, as well as minibatch for scaling to large datasets.

Downloads: 4 This Week

Last Update: 2026-04-07
See Project
13

SageMaker TensorFlow Training Toolkit

Toolkit for running TensorFlow training scripts on SageMaker

...To use your TensorFlow Serving model on SageMaker, you first need to create a SageMaker Model. After creating a SageMaker Model, you can use it to create SageMaker Batch Transform Jobs for offline inference, or create SageMaker Endpoints for real-time inference. A SageMaker Model contains references to a model.tar.gz file in S3 containing serialized model data, and a Docker image used to serve predictions with that model. A Batch Transform job runs an offline-inference job using your TensorFlow Serving model. Input data in S3 is converted to HTTP requests, and responses are saved to an output bucket in S3.

Downloads: 1 This Week

Last Update: 2025-06-04
See Project
14

ty

An extremely fast Python type checker and language server

...The tool is designed from the ground up to power editor integrations, enabling real-time feedback as developers write code with minimal latency. ty includes advanced type system capabilities such as intersection types, improved type inference, and detailed diagnostics that help identify issues even in partially typed codebases. It supports integration with multiple development environments through its language server implementation, providing features like code navigation, auto-completion, and inline hints.

Downloads: 20 This Week

Last Update: 2026-04-05
See Project
15

Seldon Core

An MLOps framework to package, deploy, monitor and manage models

The de facto standard open-source platform for rapidly deploying machine learning models on Kubernetes. Seldon Core, our open-source framework, makes it easier and faster to deploy your machine learning models and experiments at scale on Kubernetes. Seldon Core serves models built in any open-source or commercial model building framework. You can make use of powerful Kubernetes features like custom resource definitions to manage model graphs. And then connect your continuous integration and...

Downloads: 2 This Week

Last Update: 2026-01-23
See Project
16

Superduper

Superduper: Integrate AI models and machine learning workflows

Superduper is a Python-based framework for building end-2-end AI-data workflows and applications on your own data, integrating with major databases. It supports the latest technologies and techniques, including LLMs, vector-search, RAG, and multimodality as well as classical AI and ML paradigms. Developers may leverage Superduper by building compositional and declarative objects that out-source the details of deployment, orchestration versioning, and more to the Superduper engine. This...

Downloads: 9 This Week

Last Update: 2025-08-26
See Project
17

pytype

A static type analyzer for Python code

...Documentation and a user guide detail configuration, supported features, and how inference interacts with modern Python idioms.

Downloads: 0 This Week

Last Update: 2025-10-09
See Project
18

Anomalib

An anomaly detection library comprising state-of-the-art algorithms

Anomalib is an open-source deep learning library focused on anomaly detection and localization tasks, collecting state-of-the-art algorithms and tools under one modular framework. It provides implementations of leading anomaly detection methods drawn from current research, as well as a full set of utilities for training, evaluating, benchmarking, and deploying these models on both public and private datasets. Anomalib emphasizes flexibility and reproducibility: you can use its simple APIs to...

Downloads: 6 This Week

Last Update: 4 days ago
See Project
19

Meridian

Meridian is an MMM framework

Meridian is a comprehensive, open source marketing mix modeling (MMM) framework developed by Google to help advertisers analyze and optimize the impact of their marketing investments. Built on Bayesian causal inference principles, Meridian enables organizations to evaluate how different marketing channels influence key performance indicators (KPIs) such as revenue or conversions while accounting for external factors like seasonality or economic trends. The framework provides a robust foundation for constructing in-house MMM pipelines capable of handling both national and geo-level data, with built-in support for calibration using experimental data or prior knowledge. ...

Downloads: 4 This Week

Last Update: 23 minutes ago
See Project
20

tinygrad

Deep learning framework

This may not be the best deep learning framework, but it is a deep learning framework. Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.

Downloads: 3 This Week

Last Update: 2026-01-12
See Project
21

Ray

A unified framework for scalable computing

Modern workloads like deep learning and hyperparameter tuning are compute-intensive and require distributed or parallel execution. Ray makes it effortless to parallelize single machine code — go from a single CPU to multi-core, multi-GPU or multi-node with minimal code changes. Accelerate your PyTorch and Tensorflow workload with a more resource-efficient and flexible distributed execution framework powered by Ray. Accelerate your hyperparameter search workloads with Ray Tune. Find the best...

Downloads: 5 This Week

Last Update: 2026-03-20
See Project
22

TensorRT Node for ComfyUI

Enables the best performance on NVIDIA RTX Graphics Cards

ComfyUI_TensorRT is an extension that lets ComfyUI run AI inference through NVIDIA’s TensorRT, aiming to get faster, more efficient execution on supported GPUs. It bridges the gap between ComfyUI’s flexible, node-based workflows and TensorRT’s highly optimized engine format. The result is that complex diffusion or image-processing graphs can be accelerated without the user having to rewrite the pipeline.

Downloads: 0 This Week

Last Update: 2025-10-30
See Project
23

DeepEP

DeepEP: an efficient expert-parallel communication library

...Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP addresses that by providing optimized GPU kernels and efficient dispatch/combining logic. The library also supports low-precision operations (such as FP8) to reduce memory and bandwidth usage during communication. DeepEP is aimed at large-scale model inference or training systems where expert parallelism is used to scale model capacity without replicating entire networks.

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
24

pep484 stubs for Django

PEP-484 stubs for Django

This package contains type stubs and a custom mypy plugin to provide more precise static types and type inference for Django framework. Django uses some Python "magic" that makes having precise types for some code patterns problematic. This is why we need this project. The final goal is to be able to get precise types for the most common patterns. We are independent from Django at the moment. There's a proposal to merge our project into the Django itself.

Downloads: 7 This Week

Last Update: 2026-04-01
See Project
25

PyTorch Lightning

The lightweight PyTorch wrapper for high-performance AI research

...It's designed to decouple the science from the engineering in your PyTorch code, simplifying complex network coding and giving you maximum flexibility. PyTorch Lightning can be used for just about any type of research, and was built for the fast inference needed in AI research and production. When you need to scale up things like BERT and self-supervised learning, Lightning responds accordingly by automatically exporting to ONNX or TorchScript. PyTorch Lightning can easily be applied for any use case. With just a quick refactor you can run your code on any hardware, run distributed training, perform logging, metrics, visualization and so much more!

Downloads: 10 This Week

Last Update: 2026-01-30
See Project