Showing 520 open source projects for "inference"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    AI Runner

    AI Runner

    Offline inference engine for art, real-time voice conversations

    AI Runner is an offline inference engine designed to run a collection of AI workloads on your own machine, including image generation for art, real-time voice conversations, LLM-powered chatbots and automated workflows. It is implemented as a desktop-oriented Python application and emphasizes privacy and self-hosting, allowing users to work with text-to-speech, speech-to-text, text-to-image and multimodal models without sending data to external services.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    FastSD CPU

    FastSD CPU

    Fast stable diffusion on CPU and AI PC

    FastSD CPU is an optimized fork of Stable Diffusion designed to run efficiently on CPUs and devices without dedicated GPUs by leveraging Latent Consistency Models and Adversarial Diffusion Distillation techniques that accelerate inference. It focuses on bringing fast text-to-image generation to mainstream hardware like desktop CPUs, lower-end laptops, or edge devices without requiring high-end graphics processors. The repository contains multiple interfaces including a desktop GUI for simple generation, an advanced web-based UI with support for extensions like LoRA and ControlNet, and a command-line interface for scripted usage or server deployments. ...
    Downloads: 35 This Week
    Last Update:
    See Project
  • 3
    Open Interpreter

    Open Interpreter

    A natural language interface for computers

    ...It lets large language models (LLMs) run code locally (Python, JavaScript, shell, etc.), enabling you to ask your computer to do tasks like data analysis, file manipulation, browsing, etc. in human terms (“chat with your computer”), with safeguards. Runs locally or via configured remote LLM servers/inference backends, giving flexibility to use models you trust or have locally. It prompts you to approve code before executing, and supports both online LLM models and local inference servers. It seeks to combine convenience (like ChatGPT’s code interpreter) with control and flexibility by running on your own machine.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 4
    LatentSync

    LatentSync

    Taming Stable Diffusion for Lip Sync

    ...The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. Over versions, LatentSync has improved temporal stability and lowered resource requirements — making inference more practical (e.g. 8 GB VRAM for earlier versions, somewhat higher for latest models).
    Downloads: 2 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    SAM 2

    SAM 2

    The repository provides code for running inference with SAM 2

    ...It retains the core promptable interface—accepting points, boxes, or masks—but incorporates architectural and training enhancements to produce higher-fidelity masks, better boundary adherence, and robustness to complex scenes. The updated model is optimized for faster inference and lower memory use, enabling real-time interactivity even on larger images or constrained hardware. SAM2 comes with pretrained weights and easy-to-use APIs, enabling developers and researchers to integrate promptable segmentation into annotation tools, vision pipelines, or downstream tasks. The project also includes scripts and notebooks to compare SAM2 against SAM on edge cases, benchmarks showing improvements, and evaluation suites to measure mask quality metrics like IoU and boundary error.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    Norfair

    Norfair

    Lightweight Python library for adding real-time multi-object tracking

    ...This includes detectors performing tasks such as object or keypoint detection. It can easily be inserted into complex video processing pipelines to add tracking to existing projects. At the same time, it is possible to build a video inference loop from scratch using just Norfair and a detector. Supports moving camera, re-identification with appearance embeddings, and n-dimensional object tracking. Norfair provides several predefined distance functions to compare tracked objects and detections. The distance functions can also be defined by the user, enabling the implementation of different tracking strategies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    R-KV

    R-KV

    Redundancy-aware KV Cache Compression for Reasoning Models

    ...The approach focuses on identifying which attention heads and cache components are most important for maintaining reasoning quality, allowing less critical information to be compressed or discarded. This results in more efficient inference without significantly degrading model performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    GLM-4.6

    GLM-4.6

    Agentic, Reasoning, and Coding (ARC) foundation models

    ...The model achieves superior coding performance, excelling in benchmarks and practical coding assistants such as Claude Code, Cline, Roo Code, and Kilo Code. Its reasoning capabilities have been strengthened, including improved tool usage during inference and more effective integration within agent frameworks. GLM-4.6 also enhances writing quality, producing outputs that better align with human preferences and role-playing scenarios. Benchmark evaluations demonstrate that it not only outperforms GLM-4.5 but also rivals leading global models such as DeepSeek-V3.1-Terminus and Claude Sonnet 4.
    Downloads: 102 This Week
    Last Update:
    See Project
  • 9
    ChatGLM-6B

    ChatGLM-6B

    ChatGLM-6B: An Open Bilingual Dialogue Language Model

    ChatGLM-6B is an open bilingual (Chinese + English) conversational language model based on the GLM architecture, with approximately 6.2 billion parameters. The project provides inference code, demos (command line, web, API), quantization support for lower memory deployment, and tools for finetuning (e.g., via P-Tuning v2). It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference (INT4, INT8) to reduce GPU memory requirements. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    TabPFN

    TabPFN

    Foundation Model for Tabular Data

    ...The model is based on transformer architectures and implements a prior-data fitted network that can perform supervised learning tasks such as classification and regression with minimal configuration. Unlike many traditional machine learning workflows that require extensive hyperparameter tuning and training cycles, TabPFN is pre-trained to perform inference directly on tabular datasets. This allows it to generate predictions extremely quickly, often within seconds, while maintaining competitive accuracy on small and medium-sized datasets. The system supports a variety of tabular machine learning tasks and is designed to handle structured datasets commonly found in spreadsheets, databases, and business analytics systems.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 11
    HunyuanImage-3.0

    HunyuanImage-3.0

    A Powerful Native Multimodal Model for Image Generation

    ...It uses a Mixture-of-Experts (MoE) architecture with many expert subnetworks to scale efficiently, deploying only a subset of experts per token, which allows large parameter counts without linear inference cost explosion. The model is intended to be competitive with closed-source image generation systems, aiming for high fidelity, prompt adherence, fine detail, and even “world knowledge” reasoning (i.e. leveraging context, semantics, or common sense in generation). The GitHub repo includes code, scripts, model loading instructions, inference utilities, prompt handling, and integration with standard ML tooling (e.g. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    Wan2.2

    Wan2.2

    Wan2.2: Open and Advanced Large-Scale Video Generative Model

    Wan2.2 is a major upgrade to the Wan series of open and advanced large-scale video generative models, incorporating cutting-edge innovations to boost video generation quality and efficiency. It introduces a Mixture-of-Experts (MoE) architecture that splits the denoising process across specialized expert models, increasing total model capacity without raising computational costs. Wan2.2 integrates meticulously curated cinematic aesthetic data, enabling precise control over lighting,...
    Downloads: 166 This Week
    Last Update:
    See Project
  • 13
    PyMC3

    PyMC3

    Probabilistic programming in Python

    ...A Gaussian process (GP) can be used as a prior probability distribution whose support is over the space of continuous functions. PyMC3 provides rich support for defining and using GPs. Variational inference saves computational cost by turning a problem of integration into one of optimization. PyMC3's variational API supports a number of cutting edge algorithms, as well as minibatch for scaling to large datasets.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    LLM Foundry

    LLM Foundry

    LLM training code for MosaicML foundation models

    Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. Large language models (LLMs) are changing the world, but for those outside well-resourced industry labs, it can be extremely difficult to train and deploy...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    MLC LLM

    MLC LLM

    Universal LLM Deployment Engine with ML Compilation

    ...The project focuses on compiling models into optimized runtimes that can run natively on devices such as GPUs, mobile processors, browsers, and edge hardware. By leveraging machine learning compilation techniques, mlc-llm produces high-performance inference engines that maintain consistent APIs across platforms. The system supports deployment on environments including Linux, macOS, Windows, iOS, Android, and web browsers while utilizing different acceleration technologies such as CUDA, Vulkan, Metal, and WebGPU. It also provides OpenAI-compatible APIs that allow developers to integrate locally deployed models into existing AI applications without major code changes.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 16
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    ...The project provides code, pretrained weights, and inference instructions, making it feasible to deploy locally or on a server, and to integrate with applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    SageMaker TensorFlow Training Toolkit

    SageMaker TensorFlow Training Toolkit

    Toolkit for running TensorFlow training scripts on SageMaker

    ...To use your TensorFlow Serving model on SageMaker, you first need to create a SageMaker Model. After creating a SageMaker Model, you can use it to create SageMaker Batch Transform Jobs for offline inference, or create SageMaker Endpoints for real-time inference. A SageMaker Model contains references to a model.tar.gz file in S3 containing serialized model data, and a Docker image used to serve predictions with that model. A Batch Transform job runs an offline-inference job using your TensorFlow Serving model. Input data in S3 is converted to HTTP requests, and responses are saved to an output bucket in S3.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Z80-μLM

    Z80-μLM

    Z80-μLM is a 2-bit quantized language model

    ...A key deliverable is producing CP/M-compatible .COM binaries, enabling a genuinely vintage “chat with your computer” experience on real hardware or accurate emulators. The project sits at the intersection of machine learning and systems constraints, showing how model architecture, quantization, and inference code generation can be adapted to extreme memory and compute limits. It also functions as an educational reference for how to reduce inference to operations that fit an old-school instruction set and runtime environment.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Arize Phoenix

    Arize Phoenix

    Uncover insights, surface problems, monitor, and fine tune your LLM

    Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative) are an amazing technology that will power many of future ML use cases. A large set of these technologies are being deployed into businesses (the real world) in what we consider a production setting.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    MiniCPM4.1

    MiniCPM4.1

    Achieving 3+ generation speedup on reasoning tasks

    ...The model also supports both dense and sparse attention mechanisms, enabling more efficient computation depending on the selected inference framework. With improved pretraining on longer sequences and enhanced scaling techniques, MiniCPM4.1 delivers better performance in long-context tasks and complex problem solving.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Oasis

    Oasis

    Inference script for Oasis 500M

    Open-Oasis provides inference code and released weights for Oasis 500M, an interactive world model that generates gameplay frames conditioned on user keyboard input. Instead of rendering a pre-built game world, the system produces the next visual state via a diffusion-transformer approach, effectively “imagining” the world response to your actions in real time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    LitGPT

    LitGPT

    20+ high-performance LLMs with recipes to pretrain, finetune at scale

    LitGPT is a collection of over 20 high-performance large language models (LLMs) accompanied by recipes to pretrain, finetune, and deploy them at scale. It provides implementations without abstractions, making it beginner-friendly while offering advanced features like flash attention and support for various precision levels. LitGPT is designed to run efficiently across multiple GPUs or TPUs, catering to both small-scale and large-scale deployments.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 23
    TensorFlow Probability

    TensorFlow Probability

    Probabilistic reasoning and statistical analysis in TensorFlow

    ...TFP is open source and available on GitHub. Tools to build deep probabilistic models, including probabilistic layers and a `JointDistribution` abstraction. Variational inference and Markov chain Monte Carlo. A wide selection of probability distributions and bijectors. Optimizers such as Nelder-Mead, BFGS, and SGLD.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    SAM 3D Body

    SAM 3D Body

    Code for running inference with the SAM 3D Body Model 3DB

    ...The model is trained to be robust in diverse, in-the-wild conditions, so it handles varied clothing, viewpoints, and backgrounds while maintaining strong accuracy across multiple human-pose benchmarks. The repository provides Python code to run inference, utilities to download checkpoints from Hugging Face, and demo scripts that turn images into 3D meshes and visualizations. There are Jupyter notebooks that walk you through setting up the model, running it on example images, and visualizing outputs in 3D, making it approachable even if you are not a 3D expert.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Granite TSFM

    Granite TSFM

    Foundation Models for Time Series

    ...Issues and examples in the tracker illustrate common tasks such as slicing inference windows or using pipeline helpers that return pandas DataFrames, grounding the library in day-to-day time-series operations. The ecosystem around TSFM also includes a community cookbook of “recipes” that showcase capabilities and patterns. Overall, the repo is designed as a hands-on companion for teams adopting time-series foundation models in production-leaning settings.
    Downloads: 3 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB