inference free download

Showing 138 open source projects for "inference"

View related business solutions

Software Development Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Add Two Lines of Code. Get Full APM.
AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.

Start Free
1

Mistral Inference

Official inference library for Mistral models

Open and portable generative AI for devs and businesses. We release open-weight models for everyone to customize and deploy where they want it. Our super-efficient model Mistral Nemo is available under Apache 2.0, while Mistral Large 2 is available through both a free non-commercial license, and a commercial license.

Downloads: 9 This Week

Last Update: 2025-03-20
See Project
2

SageMaker Hugging Face Inference Toolkit

Library for serving Transformers models on Amazon SageMaker

SageMaker Hugging Face Inference Toolkit is an open-source library for serving Transformers models on Amazon SageMaker. This library provides default pre-processing, predict and postprocessing for certain Transformers models and tasks. It utilizes the SageMaker Inference Toolkit for starting up the model server, which is responsible for handling inference requests.

Downloads: 5 This Week

Last Update: 2026-03-17
See Project
3

TensorRT

C++ library for high performance inference on NVIDIA GPUs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. ...

Downloads: 22 This Week

Last Update: 2026-03-25
See Project
4

Gen.jl

A general-purpose probabilistic programming system

An open-source stack for generative modeling and probabilistic inference. Gen’s inference library gives users building blocks for writing efficient probabilistic inference algorithms that are tailored to their models, while automating the tricky math and the low-level implementation details. Gen helps users write hybrid algorithms that combine neural networks, variational inference, sequential Monte Carlo samplers, and Markov chain Monte Carlo.

Downloads: 5 This Week

Last Update: 2025-07-11
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
5

ONNX Runtime

ONNX Runtime: cross-platform, high performance ML inferencing

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. ...

Downloads: 78 This Week

Last Update: 2026-03-17
See Project
6

KServe

Standardized Serverless ML Inference Platform on Kubernetes

KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and...

Downloads: 10 This Week

Last Update: 2026-03-13
See Project
7

AIMET

AIMET is a library that provides advanced quantization and compression

Qualcomm Innovation Center (QuIC) is at the forefront of enabling low-power inference at the edge through its pioneering model-efficiency research. QuIC has a mission to help migrate the ecosystem toward fixed-point inference. With this goal, QuIC presents the AI Model Efficiency Toolkit (AIMET) - a library that provides advanced quantization and compression techniques for trained neural network models. AIMET enables neural networks to run more efficiently on fixed-point AI hardware accelerators. ...

Downloads: 23 This Week

Last Update: 2026-04-06
See Project
8

bitnet.cpp

Official inference framework for 1-bit LLMs

bitnet.cpp is the official open-source inference framework and ecosystem designed to enable ultra-efficient execution of 1-bit large language models (LLMs), which quantize most model parameters to ternary values (-1, 0, +1) while maintaining competitive performance with full-precision counterparts. At its core is bitnet.cpp, a highly optimized C++ backend that supports fast, low-memory inference on both CPUs and GPUs, enabling models such as BitNet b1.58 to run without requiring enormous compute infrastructure. ...

Downloads: 13 This Week

Last Update: 2026-03-10
See Project
9

MegEngine

Easy-to-use deep learning framework with 3 key features

...On Windows 10 you can either install the Linux distribution through Windows Subsystem for Linux (WSL) or install the Windows distribution directly. Many other platforms are supported for inference.

Downloads: 4 This Week

Last Update: 2024-04-30
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
10

BentoML

Unified Model Serving Framework

...Standard .bento format for packaging code, models and dependencies for easy versioning and deployment. Integrate with any training pipeline or ML experimentation platform. Parallelize compute-intense model inference workloads to scale separately from the serving logic. Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.

Downloads: 2 This Week

Last Update: 2026-04-02
See Project
11

NNCF

Neural Network Compression Framework for enhanced OpenVINO

NNCF (Neural Network Compression Framework) is an optimization toolkit for deep learning models, designed to apply quantization, pruning, and other techniques to improve inference efficiency.

Downloads: 8 This Week

Last Update: 7 days ago
See Project
12

AWS Deep Learning Containers

A set of Docker images for training and serving models in TensorFlow

...Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. This project is licensed under the Apache-2.0 License. Ensure you have access to an AWS account i.e. setup your environment such that awscli can access your account via either an IAM user or an IAM role.

Downloads: 8 This Week

Last Update: 22 hours ago
See Project
13

ncnn

High-performance neural network inference framework for mobile

ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs.

Downloads: 29 This Week

Last Update: 2026-01-13
See Project
14

AWS Neuron

Powering Amazon custom machine learning chips

AWS Neuron is a software development kit (SDK) for running machine learning inference using AWS Inferentia chips. It consists of a compiler, run-time, and profiling tools that enable developers to run high-performance and low latency inference using AWS Inferentia-based Amazon EC2 Inf1 instances. Using Neuron developers can easily train their machine learning models on any popular framework such as TensorFlow, PyTorch, and MXNet, and run it optimally on Amazon EC2 Inf1 instances. ...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
15

OpenShell

OpenShell is the safe, private runtime for autonomous AI agents.

...Each agent runs inside a containerized sandbox governed by declarative YAML security policies that control network access, file permissions, and process behavior. The platform includes a gateway service that manages sandbox lifecycles and routes AI inference requests through controlled providers. OpenShell also features a privacy-aware routing system that prevents sensitive information from leaving the sandbox environment. By combining container isolation, policy enforcement, and agent orchestration, OpenShell offers a secure infrastructure for developing and operating AI agents.

Downloads: 29 This Week

Last Update: 8 hours ago
See Project
16

Scala 3

The Scala 3 compiler, also known as Dotty

Scala 3 is the latest major release of the Scala language—featuring a complete compiler rewrite (Dotty), new syntax with optional braces and given/using contextual abstractions, union/intersection types, opaque types, first-class enums, and better type inference. It unifies object-oriented and functional programming paradigms into a safer, more expressive language running on the JVM with full Java interoperability.

Downloads: 46 This Week

Last Update: 2026-03-25
See Project
17

MNN

MNN is a blazing fast, lightweight deep learning framework

MNN is a highly efficient and lightweight deep learning framework. It supports inference and training of deep learning models, and has industry leading performance for inference and training on-device. At present, MNN has been integrated in more than 20 apps of Alibaba Inc, such as Taobao, Tmall, Youku, Dingtalk, Xianyu and etc., covering more than 70 usage scenarios such as live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control. ...

Downloads: 19 This Week

Last Update: 2026-04-07
See Project
18

AudioCraft

Audiocraft is a library for audio processing and generation

AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. ...

Downloads: 9 This Week

Last Update: 2025-10-13
See Project
19

Zod

TypeScript-first schema validation with static type inference

TypeScript-first schema validation with static type inference. Zod is a TypeScript-first schema declaration and validation library. I'm using the term "schema" to broadly refer to any data type, from a simple string to a complex nested object. Zod is designed to be as developer-friendly as possible. The goal is to eliminate duplicative type declarations. With Zod, you declare a validator once and Zod will automatically infer the static TypeScript type.

Downloads: 10 This Week

Last Update: 2026-01-22
See Project
20

PyMC3

Probabilistic programming in Python

...A Gaussian process (GP) can be used as a prior probability distribution whose support is over the space of continuous functions. PyMC3 provides rich support for defining and using GPs. Variational inference saves computational cost by turning a problem of integration into one of optimization. PyMC3's variational API supports a number of cutting edge algorithms, as well as minibatch for scaling to large datasets.

Downloads: 4 This Week

Last Update: 2026-04-07
See Project
21

SageMaker TensorFlow Training Toolkit

Toolkit for running TensorFlow training scripts on SageMaker

...To use your TensorFlow Serving model on SageMaker, you first need to create a SageMaker Model. After creating a SageMaker Model, you can use it to create SageMaker Batch Transform Jobs for offline inference, or create SageMaker Endpoints for real-time inference. A SageMaker Model contains references to a model.tar.gz file in S3 containing serialized model data, and a Docker image used to serve predictions with that model. A Batch Transform job runs an offline-inference job using your TensorFlow Serving model. Input data in S3 is converted to HTTP requests, and responses are saved to an output bucket in S3.

Downloads: 1 This Week

Last Update: 2025-06-04
See Project
22

ty

An extremely fast Python type checker and language server

...The tool is designed from the ground up to power editor integrations, enabling real-time feedback as developers write code with minimal latency. ty includes advanced type system capabilities such as intersection types, improved type inference, and detailed diagnostics that help identify issues even in partially typed codebases. It supports integration with multiple development environments through its language server implementation, providing features like code navigation, auto-completion, and inline hints.

Downloads: 20 This Week

Last Update: 11 hours ago
See Project
23

Gitleaks

Protect and discover secrets using Gitleaks

Gitleaks is a fast, lightweight, portable, and open-source secret scanner for git repositories, files, and directories. With over 6.8 million docker downloads, 11.2k GitHub stars, 1.7 million GitHub Downloads, thousands of weekly clones, and over 400k homebrew installs, gitleaks is the most trusted secret scanner among security professionals, enterprises, and developers. Gitleaks-Action is our official GitHub Action. You can use it to automatically run a gitleaks scan on all your team's pull...

Downloads: 45 This Week

Last Update: 2026-03-21
See Project
24

Cthulhu.jl

The slow descent into madness

Cthulhu.jl is a powerful introspection tool for exploring the Julia compiler’s method dispatch and type inference system. It allows users to interactively descend into the type-inferred lowered and LLVM IR of Julia functions from the REPL. This makes it ideal for developers who want to optimize performance, debug type instability, or understand how Julia compiles code. Named after the Lovecraftian idea of descending into madness, Cthulhu reveals the "underworld" of Julia compilation.

Downloads: 2 This Week

Last Update: 2026-03-03
See Project
25

Doctrine extensions for PHPStan

Doctrine extensions for PHPStan

...Validates entity fields in repository findBy, findBy, findOneBy, findOneBy, count and countBy method calls. Interprets EntityRepository MyEntity correctly in phpDocs for further type inference of methods called on the repository. Provides correct return for Doctrine\ORM\EntityManager::getRepository(). Provides correct return type for Doctrine\ORM\EntityManager::find, getReference and getPartialReference when Foo::class entity class name is provided as the first argument. Queries are analyzed statically and do not require a running database server. ...

Downloads: 7 This Week

Last Update: 2026-03-13
See Project