Showing 453 open source projects for "inference"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Oumi

    Oumi

    Everything you need to build state-of-the-art foundation models

    Oumi is an open-source framework that provides everything needed to build state-of-the-art foundation models, end-to-end. It aims to simplify the development of large-scale machine-learning models.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Scanpy

    Scanpy

    Single-cell analysis in Python

    Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Stable Diffusion Version 2

    Stable Diffusion Version 2

    High-Resolution Image Synthesis with Latent Diffusion Models

    ...The repository provides code for training and running Stable Diffusion-style models, instructions for installing dependencies (with notes about performance libraries like xformers), and guidance on hardware/driver requirements for efficient GPU inference and training. It’s organized as a practical, developer-focused toolkit: model code, scripts for inference, and examples for using memory-efficient attention and related optimizations are included so researchers and engineers can run or adapt the model for their own projects. The project sits within a larger ecosystem of Stability AI repositories (including inference-only reference implementations like SD3.5 and web UI projects) and the README points users toward compatible components, recommended CUDA/PyTorch versions.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    bitnet.cpp

    bitnet.cpp

    Official inference framework for 1-bit LLMs

    bitnet.cpp is the official open-source inference framework and ecosystem designed to enable ultra-efficient execution of 1-bit large language models (LLMs), which quantize most model parameters to ternary values (-1, 0, +1) while maintaining competitive performance with full-precision counterparts. At its core is bitnet.cpp, a highly optimized C++ backend that supports fast, low-memory inference on both CPUs and GPUs, enabling models such as BitNet b1.58 to run without requiring enormous compute infrastructure. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    Nano-vLLM

    Nano-vLLM

    A lightweight vLLM implementation built from scratch

    Nano-vLLM is a lightweight implementation of the vLLM inference engine designed to run large language models efficiently while maintaining a minimal and readable codebase. The project recreates the core functionality of vLLM in a simplified architecture written in approximately a thousand lines of Python, making it easier for developers and researchers to understand how modern LLM inference systems work.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    huggingface_hub

    huggingface_hub

    The official Python client for the Huggingface Hub

    The huggingface_hub library allows you to interact with the Hugging Face Hub, a platform democratizing open-source Machine Learning for creators and collaborators. Discover pre-trained models and datasets for your projects or play with the thousands of machine-learning apps hosted on the Hub. You can also create and share your own models, datasets, and demos with the community. The huggingface_hub library provides a simple way to do all these things with Python.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 7
    ModelScope

    ModelScope

    Bring the notion of Model-as-a-Service to life

    ...Once integrated, model inference, fine-tuning, and evaluations can be done with only a few lines of code.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    Transformer Engine

    Transformer Engine

    A library for accelerating Transformer models on NVIDIA GPUs

    ...As the number of parameters in Transformer models continues to grow, training and inference for architectures such as BERT, GPT, and T5 become very memory and compute-intensive. Most deep learning frameworks train with FP32 by default. This is not essential, however, to achieve full accuracy for many deep learning models.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 9
    SimpleLLM

    SimpleLLM

    950 line, minimal, extensible LLM inference engine built from scratch

    SimpleLLM is a minimal, extensible large language model inference engine implemented in roughly 950 lines of code, built from scratch to serve both as a learning tool and a research platform for novel inference techniques. It provides the core components of an LLM runtime—such as tokenization, batching, and asynchronous execution—without the abstraction overhead of more complex engines, making it easier for developers and researchers to understand and modify.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    OpenVINO Notebooks

    OpenVINO Notebooks

    Jupyter notebook tutorials for OpenVINO

    ...The project is particularly useful for developers who want to learn how to optimize machine learning inference pipelines for production environments.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Faster Whisper

    Faster Whisper

    Faster Whisper transcription with CTranslate2

    Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based on their needs. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 12
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    ...The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. Wan2.1’s architecture balances generation quality and inference cost, paving the way for later improvements seen in Wan2.2 such as Mixture-of-Experts and enhanced aesthetics. It was trained on large-scale video and image datasets, providing generalization across diverse scenes and motion patterns.
    Downloads: 92 This Week
    Last Update:
    See Project
  • 13
    AirLLM

    AirLLM

    AirLLM 70B inference with single 4GB GPU

    AirLLM is an open source Python library that enables extremely large language models to run on consumer hardware with very limited GPU memory. The project addresses one of the main barriers to local LLM experimentation by introducing a memory-efficient inference technique that loads model layers sequentially rather than storing the entire model in GPU memory. This layer-wise inference approach allows models with tens of billions of parameters to run on devices with only a few gigabytes of VRAM. AirLLM preprocesses model weights so that each transformer layer can be loaded independently during computation, reducing the memory footprint while still performing full inference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    BentoML

    BentoML

    Unified Model Serving Framework

    ...Standard .bento format for packaging code, models and dependencies for easy versioning and deployment. Integrate with any training pipeline or ML experimentation platform. Parallelize compute-intense model inference workloads to scale separately from the serving logic. Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    GPflow

    GPflow

    Gaussian processes in TensorFlow

    GPflow is a package for building Gaussian process models in Python. It implements modern Gaussian process inference for composable kernels and likelihoods. GPflow builds on TensorFlow 2.4+ and TensorFlow Probability for running computations, which allows fast execution on GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    HunyuanVideo

    HunyuanVideo

    HunyuanVideo: A Systematic Framework For Large Video Generation Model

    HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU memory usage / improve efficiency. Parallel inference code to speed up sampling, utilities and tests included.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    SageMaker Python SDK

    SageMaker Python SDK

    Training and deploying machine learning models on Amazon SageMaker

    SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the SDK, you can train and deploy models using popular deep learning frameworks Apache MXNet and TensorFlow. You can also train and deploy models with Amazon algorithms, which are scalable implementations of core machine learning algorithms that are optimized for SageMaker and GPU training. If you have your own algorithms built into SageMaker-compatible Docker...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    OpenLLM

    OpenLLM

    Operating LLMs in production

    An open platform for operating large language models (LLMs) in production. Fine-tune, serve, deploy, and monitor any LLMs with ease. With OpenLLM, you can run inference with any open-source large-language models, deploy to the cloud or on-premises, and build powerful AI apps. Built-in supports a wide range of open-source LLMs and model runtime, including Llama 2, StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder, and more. Serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 19
    FLUX.1

    FLUX.1

    Official inference repo for FLUX.1 models

    FLUX.1 repository contains inference code and tooling for the FLUX.1 text-to-image diffusion models, enabling developers and researchers to generate and edit images from natural-language prompts using open-weight versions of the model on their own hardware or within custom applications. The project is part of a larger family of FLUX models developed by Black Forest Labs, designed to produce high-quality, detailed visuals from text descriptions with competitive prompt adherence and artistic fidelity. ...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 20
    AWS Neuron

    AWS Neuron

    Powering Amazon custom machine learning chips

    AWS Neuron is a software development kit (SDK) for running machine learning inference using AWS Inferentia chips. It consists of a compiler, run-time, and profiling tools that enable developers to run high-performance and low latency inference using AWS Inferentia-based Amazon EC2 Inf1 instances. Using Neuron developers can easily train their machine learning models on any popular framework such as TensorFlow, PyTorch, and MXNet, and run it optimally on Amazon EC2 Inf1 instances. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an...
    Downloads: 48 This Week
    Last Update:
    See Project
  • 22
    AWS Deep Learning Containers

    AWS Deep Learning Containers

    A set of Docker images for training and serving models in TensorFlow

    ...Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. This project is licensed under the Apache-2.0 License. Ensure you have access to an AWS account i.e. setup your environment such that awscli can access your account via either an IAM user or an IAM role.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    DeepSparse

    DeepSparse

    Sparsity-aware deep learning inference runtime for CPUs

    A sparsity-aware enterprise inferencing system for AI models on CPUs. Maximize your CPU infrastructure with DeepSparse to run performant computer vision (CV), natural language processing (NLP), and large language models (LLMs).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    GPUStack

    GPUStack

    Performance-optimized AI inference on your GPUs

    ...The system aggregates GPU resources from multiple machines into a unified cluster so developers and administrators can run large language models and other AI workloads efficiently across distributed infrastructure. Instead of requiring complex orchestration systems such as Kubernetes, GPUStack provides a lightweight environment that automatically selects appropriate inference engines, configures deployment parameters, and schedules workloads across available GPUs. The platform supports GPUs from a wide range of vendors and can run on laptops, workstations, and servers across operating systems such as macOS, Windows, and Linux. It also enables developers to deploy models from common repositories like Hugging Face and access them through APIs similar to cloud-based AI services.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    GLM-4.5

    GLM-4.5

    GLM-4.5: Open-source LLM for intelligent agents by Z.ai

    ...GLM-4.5 achieves strong performance on 12 industry-standard benchmarks, ranking 3rd overall, while GLM-4.5-Air balances competitive results with greater efficiency. The models support FP8 and BF16 precision, and can handle very large context windows of up to 128K tokens. Flexible inference is supported through frameworks like vLLM and SGLang with tool-call and reasoning parsers included.
    Downloads: 83 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB