inference free download

Showing 46 open source projects for "inference"

View related business solutions

Libraries Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

Mistral Inference

Official inference library for Mistral models

Open and portable generative AI for devs and businesses. We release open-weight models for everyone to customize and deploy where they want it. Our super-efficient model Mistral Nemo is available under Apache 2.0, while Mistral Large 2 is available through both a free non-commercial license, and a commercial license.

Downloads: 5 This Week

Last Update: 2025-03-20
See Project
2

SageMaker Hugging Face Inference Toolkit

Library for serving Transformers models on Amazon SageMaker

SageMaker Hugging Face Inference Toolkit is an open-source library for serving Transformers models on Amazon SageMaker. This library provides default pre-processing, predict and postprocessing for certain Transformers models and tasks. It utilizes the SageMaker Inference Toolkit for starting up the model server, which is responsible for handling inference requests.

Downloads: 4 This Week

Last Update: 2026-03-17
See Project
3

TensorRT

C++ library for high performance inference on NVIDIA GPUs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. ...

Downloads: 22 This Week

Last Update: 2026-03-25
See Project
4

Gen.jl

A general-purpose probabilistic programming system

An open-source stack for generative modeling and probabilistic inference. Gen’s inference library gives users building blocks for writing efficient probabilistic inference algorithms that are tailored to their models, while automating the tricky math and the low-level implementation details. Gen helps users write hybrid algorithms that combine neural networks, variational inference, sequential Monte Carlo samplers, and Markov chain Monte Carlo.

Downloads: 4 This Week

Last Update: 2025-07-11
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

AIMET

AIMET is a library that provides advanced quantization and compression

Qualcomm Innovation Center (QuIC) is at the forefront of enabling low-power inference at the edge through its pioneering model-efficiency research. QuIC has a mission to help migrate the ecosystem toward fixed-point inference. With this goal, QuIC presents the AI Model Efficiency Toolkit (AIMET) - a library that provides advanced quantization and compression techniques for trained neural network models. AIMET enables neural networks to run more efficiently on fixed-point AI hardware accelerators. ...

Downloads: 36 This Week

Last Update: 2026-04-06
See Project
6

ncnn

High-performance neural network inference framework for mobile

ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs.

Downloads: 92 This Week

Last Update: 2026-01-13
See Project
7

AudioCraft

Audiocraft is a library for audio processing and generation

AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. ...

Downloads: 6 This Week

Last Update: 2025-10-13
See Project
8

XNNPACK

High-efficiency floating-point neural network inference operators

XNNPACK is a highly optimized, low-level neural network inference library developed by Google for accelerating deep learning workloads across a variety of hardware architectures, including ARM, x86, WebAssembly, and RISC-V. Rather than serving as a standalone ML framework, XNNPACK provides high-performance computational primitives—such as convolutions, pooling, activation functions, and arithmetic operations—that are integrated into higher-level frameworks like TensorFlow Lite, PyTorch Mobile, ONNX Runtime, TensorFlow.js, and MediaPipe. ...

Downloads: 0 This Week

Last Update: 2 hours ago
See Project
9

MIVisionX

Set of comprehensive computer vision & machine intelligence libraries

...AMD MIVisionX delivers highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions along with Convolution Neural Net Model Compiler & Optimizer supporting ONNX, and Khronos NNEF™ exchange formats. The toolkit allows for rapid prototyping and deployment of optimized computer vision and machine learning inference workloads on a wide range of computer hardware, including small embedded x86 CPUs, APUs, discrete GPUs, and heterogeneous servers. AMD OpenVX is a highly optimized open-source implementation of the Khronos OpenVX™ 1.3 computer vision specification. It allows for rapid prototyping as well as fast execution on a wide range of computer hardware, including small embedded x86 CPUs and large workstation discrete GPUs.

Downloads: 0 This Week

Last Update: 2026-02-06
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
10

Anomalib

An anomaly detection library comprising state-of-the-art algorithms

Anomalib is an open-source deep learning library focused on anomaly detection and localization tasks, collecting state-of-the-art algorithms and tools under one modular framework. It provides implementations of leading anomaly detection methods drawn from current research, as well as a full set of utilities for training, evaluating, benchmarking, and deploying these models on both public and private datasets. Anomalib emphasizes flexibility and reproducibility: you can use its simple APIs to...

Downloads: 3 This Week

Last Update: 2026-04-10
See Project
11

ProbabilisticCircuits.jl

Probabilistic Circuits from the Juice library

This module provides a Julia implementation of Probabilistic Circuits (PCs), tools to learn structure and parameters of PCs from data, and tools to do tractable exact inference with them. Probabilistic Circuits provides a unifying framework for several family of tractable probabilistic models. PCs are represented as computational graphs that define a joint probability distribution as recursive mixtures (sum units) and factorizations (product units) of simpler distributions (input units). Given certain structural properties, PCs enable different range of tractable exact probabilistic queries such as computing marginals, conditionals, maximum a posteriori (MAP), and more advanced probabilistic queries.

Downloads: 4 This Week

Last Update: 2024-06-10
See Project
12

type-challenges

Collection of TypeScript type challenges with online judge

...Each challenge is a miniature kata where you implement types that transform other types—parsing strings, inferring tuples, mapping unions—without writing any runtime code. Problems are arranged from warm-ups to brain-twisters, letting developers build intuition about distributive conditional types, inference in extends, variance, and other corner cases of the type system. The repository includes tests for each puzzle so you get immediate, compiler-driven feedback when your solution is correct. As a result, it doubles as both training material and a living reference for advanced patterns used in real libraries. Many engineers report that solving a handful of these dramatically improves their ability to write safe, expressive APIs with minimal runtime overhead.

Downloads: 0 This Week

Last Update: 2026-03-01
See Project
13

DeepEP

DeepEP: an efficient expert-parallel communication library

...Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP addresses that by providing optimized GPU kernels and efficient dispatch/combining logic. The library also supports low-precision operations (such as FP8) to reduce memory and bandwidth usage during communication. DeepEP is aimed at large-scale model inference or training systems where expert parallelism is used to scale model capacity without replicating entire networks.

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
14

frugally-deep

A lightweight header-only library for using Keras (TensorFlow) models

Use Keras models in C++ with ease. A lightweight header-only library for using Keras (TensorFlow) models in C++. Works out-of-the-box also when compiled into a 32-bit executable. (Of course, 64 bit is fine too.) Avoids temporarily allocating (potentially large chunks of) additional RAM during convolutions (by not materializing the im2col input matrix). Utterly ignores even the most powerful GPU in your system and uses only one CPU core per prediction. Quite fast on one CPU core, and you can...

Downloads: 5 This Week

Last Update: 2025-05-16
See Project
15

OCaml

The core OCaml system: compilers, runtime system, base libraries

...OCaml’s powerful type system means more bugs are caught at compile-time, and large, complex codebases are easier to maintain. This makes it a good language for running critical code. At the same time, sophisticated inference makes the type system unobtrusive, creating a smooth developer experience. OCaml has two compilers. One is a bytecode compiler that generates small, portable executables and is very fast. The other is a native code compiler that produces more efficient machine code; its performance matches the highest standards of modern compilers. ...

1 Review

Downloads: 2 This Week

Last Update: 2026-02-17
See Project
16

ARC-AGI

The Abstraction and Reasoning Corpus

ARC-AGI is a benchmark dataset and experimental framework designed to evaluate and advance artificial general intelligence by testing systems on abstract reasoning tasks that require human-like problem-solving abilities. It consists of a curated set of tasks where models must infer patterns from input-output examples and apply those rules to new unseen cases, without relying on memorization or prior training data. The dataset is structured as grid-based puzzles, where each task requires...

Downloads: 1 This Week

Last Update: 2026-04-03
See Project
17

Recursive Language Models

General plug-and-play inference library for Recursive Language Models

RLM (short for Reinforcement Learning Models) is a modular framework that makes it easier to build, train, evaluate, and deploy reinforcement learning (RL) agents across a wide range of environments and tasks. It provides a consistent API that abstracts away many of the repetitive engineering patterns in RL research and application work, letting developers focus on modeling, experimentation, and fine-tuning rather than infrastructure plumbing. Within the framework, you can define custom...

Downloads: 1 This Week

Last Update: 2026-02-18
See Project
18

ClearScript

A library for adding scripting to .NET applications

...Exposed resources require no modification, decoration, or special coding of any kind. Scripts get simple access to most of the features of exposed objects and types. Full support for generic types and methods, including C#-like type inference and explicit type arguments. Exposed .NET collections support native script iteration mechanisms. Scripts can invoke methods with output parameters, optional parameters, and parameter arrays. Support for exposing all the types defined in one or more assemblies in one step. Optional support for importing types and assemblies from script code.

Downloads: 1 This Week

Last Update: 2026-03-25
See Project
19

Cats

Lightweight, modular, extensible library for functional programming

Cats is a library which provides abstractions for functional programming in the Scala programming language. The name is a playful shortening of the word category. Scala supports both object-oriented and functional programming, and this is reflected in the hybrid approach of the standard library. Cats strives to provide functional programming abstractions that are core, binary compatible, modular, approachable and efficient. A broader goal of Cats is to provide a foundation for an ecosystem...

Downloads: 1 This Week

Last Update: 2025-01-18
See Project
20

Pothos GraphQL

Pothos GraphQL is library for creating GraphQL schemas in typescript

...The core of Pothos adds 0 overhead at runtime, and has graphql as its only dependency. Pothos is the most type-safe way to build GraphQL schemas in typescript, and by leveraging type inference and typescript's powerful type system Pothos requires very few manual type definitions and no code generation. Pothos has a unique and powerful plugin system that makes every plugin feel like its features are built into the core library. Plugins can extend almost any part of the API by adding new options or methods that can take full advantage of the Pothos type system. ...

Downloads: 4 This Week

Last Update: 3 days ago
See Project
21

FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to efficiently manage memory during decoding. On very compute-bound settings, it can reach up to ~660 TFLOPS on H800 SXM5 hardware, while in memory-bound configurations it can push memory throughput to ~3000 GB/s. ...

Downloads: 0 This Week

Last Update: 2026-03-31
See Project
22

sqlite-utils

Python CLI utility and library for manipulating SQLite databases

sqlite-utils is both a Python library and a command-line tool for creating, inspecting, and transforming SQLite databases with minimal boilerplate. It focuses on making common tasks like importing CSV/JSON, exploring tables, and running ad-hoc queries feel ergonomic and scriptable. As a CLI, it lets you build databases from structured data in one line, run queries against local files or in-memory databases, output results as JSON, CSV, or pretty tables, and configure full-text search. As a...

Downloads: 0 This Week

Last Update: 2025-11-27
See Project
23

MMDeploy

OpenMMLab Model Deployment Framework

...Models can be exported and run in several backends, and more will be compatible. All kinds of modules in the SDK can be extended, such as Transform for image processing, Net for Neural Network inference, Module for postprocessing and so on. Install and build your target backend. ONNX Runtime is a cross-platform inference and training accelerator compatible with many popular ML/DNN frameworks. Please read getting_started for the basic usage of MMDeploy.

Downloads: 0 This Week

Last Update: 2023-12-25
See Project
24

Neural Tangents

Fast and Easy Infinite Neural Networks in Python

...It lets researchers define architectures from familiar building blocks—convolutions, pooling, residual connections, and nonlinearities—and obtain not only the finite network but also the corresponding Gaussian Process (GP) kernel of its infinite-width limit. With a single specification, you can compute NNGP and NTK kernels, perform exact GP inference, and study training dynamics analytically for infinitely wide networks. The library closely mirrors JAX’s stax API while extending it to return a kernel_fn alongside init_fn and apply_fn, enabling drop-in workflows for kernel computation. Kernel evaluation is highly optimized for speed and memory, and computations can be automatically distributed across accelerators with near-linear scaling.

Downloads: 8 This Week

Last Update: 2025-10-10
See Project
25

spaGO

Self-contained Machine Learning and Natural Language Processing lib

A Machine Learning library written in pure Go designed to support relevant neural architectures in Natural Language Processing. Spago is self-contained, in that it uses its own lightweight computational graph both for training and inference, easy to understand from start to finish. The core module of Spago relies only on testify for unit testing. In other words, it has "zero dependencies", and we are committed to keeping it that way as much as possible. Spago uses a multi-module workspace to ensure that additional dependencies are downloaded only when specific features (e.g. persistent embeddings) are used. ...

Downloads: 2 This Week

Last Update: 2023-10-30
See Project