Showing 897 open source projects for "inference"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    FlashAttention

    FlashAttention

    Fast and memory-efficient exact attention

    ...The project provides implementations of FlashAttention, FlashAttention-2, and newer iterations optimized for modern GPU architectures such as NVIDIA Hopper and AMD accelerators. By improving both forward and backward pass efficiency, it enables training and inference of large language models with longer sequence lengths and higher throughput. The library integrates with PyTorch and supports various attention configurations, including causal masking, multi-query attention, and rotary embeddings.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 2
    Frigate NVR

    Frigate NVR

    NVR with realtime local object detection for IP cameras

    ...The system uses OpenCV and TensorFlow to analyze video feeds and detect objects such as people, vehicles, and animals in real time. Frigate is optimized for efficiency and supports hardware acceleration across a wide range of devices, including GPUs and specialized inference hardware. It also provides event recording, snapshot management, and searchable video history to improve home or small-business security workflows. Overall, Frigate functions as a privacy-focused, AI-powered NVR platform for intelligent video monitoring.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    Neural Speed

    Neural Speed

    An innovative library for efficient LLM inference

    neural-speed is an innovative library developed by Intel to enhance the efficiency of Large Language Model (LLM) inference through low-bit quantization techniques. By reducing the precision of model weights and activations, neural-speed aims to accelerate inference while maintaining model accuracy, making it suitable for deployment in resource-constrained environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    DeepSeek Coder

    DeepSeek Coder

    DeepSeek Coder: Let the Code Write Itself

    ...This dataset covers project-level code structure (not just line-by-line snippets), using a large context window (e.g. 16K) and a secondary fill-in-the-blank objective to encourage better contextual completions and infilling. Multiple sizes of the model are offered (e.g. 1B, 5.7B, 6.7B, 33B) so users can trade off inference cost vs capability. The repo provides model weights, documentation on training setup, evaluation results on common benchmarks (HumanEval, MultiPL-E, APPS, etc.), and inference tools.
    Downloads: 3 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Pruna AI

    Pruna AI

    Pruna is a model optimization framework built for developers

    Pruna is an open-source, self-hostable AI inference engine designed to help teams deploy and manage large language models (LLMs) efficiently across private or hybrid infrastructures. Built with performance and developer ergonomics in mind, Pruna simplifies inference workflows by enabling multi-model orchestration, autoscaling, GPU resource allocation, and compatibility with popular open-source models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    SageMaker TensorFlow Training Toolkit

    SageMaker TensorFlow Training Toolkit

    Toolkit for running TensorFlow training scripts on SageMaker

    ...To use your TensorFlow Serving model on SageMaker, you first need to create a SageMaker Model. After creating a SageMaker Model, you can use it to create SageMaker Batch Transform Jobs for offline inference, or create SageMaker Endpoints for real-time inference. A SageMaker Model contains references to a model.tar.gz file in S3 containing serialized model data, and a Docker image used to serve predictions with that model. A Batch Transform job runs an offline-inference job using your TensorFlow Serving model. Input data in S3 is converted to HTTP requests, and responses are saved to an output bucket in S3.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    QSV

    QSV

    Blazing-fast Data-Wrangling toolkit

    qsv is a fast, command-line CSV data toolkit written in Rust that extends the capabilities of xsv. It’s designed to make working with CSV files at scale easy and efficient, offering over 40 powerful subcommands for tasks like querying, sampling, splitting, deduplicating, and more. qsv is ideal for data engineers, analysts, and developers who need high-performance CSV manipulation on the command line.
    Downloads: 99 This Week
    Last Update:
    See Project
  • 8
    Edgee

    Edgee

    AI gateway with token compression for Claude Code, Codex, and more

    ...The platform is built to support event-driven architectures, where actions are triggered by incoming requests, user behavior, or external signals. It integrates AI capabilities into edge environments, making it possible to perform inference, personalization, and decision-making at the point of interaction. Edgee is optimized for performance and scalability, leveraging distributed execution to handle high volumes of requests efficiently. It also emphasizes developer experience, providing tools and abstractions that simplify deployment and orchestration across edge nodes.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 9
    DeepSeek Coder V2

    DeepSeek Coder V2

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models

    ...While the V1 models already targeted strong code understanding and generation, V2 appears to push further in both multilingual support and reasoning in code, likely via architectural enhancements or additional training objectives. The repository provides updated model weights, evaluation results on benchmarks (e.g. HumanEval, MultiPL-E, APPS), and new inference/serving scripts. Compared to the original, DeepSeek-Coder-V2 likely incorporates improved context management, caching strategies, or enhanced infilling capabilities. The project aims to provide a more performant and reliable open-source alternative to closed-source code models, optimized for practical usage in code completion, infilling, and code understanding across English and Chinese codebases.
    Downloads: 30 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    Gitleaks

    Gitleaks

    Protect and discover secrets using Gitleaks

    Gitleaks is a fast, lightweight, portable, and open-source secret scanner for git repositories, files, and directories. With over 6.8 million docker downloads, 11.2k GitHub stars, 1.7 million GitHub Downloads, thousands of weekly clones, and over 400k homebrew installs, gitleaks is the most trusted secret scanner among security professionals, enterprises, and developers. Gitleaks-Action is our official GitHub Action. You can use it to automatically run a gitleaks scan on all your team's pull...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 11
    mistral.rs

    mistral.rs

    Fast, flexible LLM inference

    mistral.rs is a fast and flexible LLM inference engine implemented in Rust, designed to run and serve modern language models with an emphasis on performance and practical deployment. It provides multiple entry points for developers, including a CLI for running models locally and an HTTP server that exposes an OpenAI-compatible API surface for easy integration with existing clients.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Open Model Zoo

    Open Model Zoo

    Pre-trained Deep Learning models and demos

    ...It includes hundreds of models covering object detection, classification, segmentation, pose estimation, speech recognition, text-to-speech, and more, many of which are already converted into formats optimized for inference on CPUs, GPUs, VPUs, and other accelerators supported by OpenVINO. In addition to model files, Open Model Zoo provides demo applications that show realistic usage patterns and help developers quickly prototype and understand inference pipelines in C++, Python, or via the OpenCV Graph API. Tools in the repository also help automate model downloads and other tasks, making it easier to incorporate these models into production systems or custom solutions.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Secret Llama

    Secret Llama

    Fully private LLM chatbot that runs entirely with a browser

    Secret Llama is a privacy-first large-language-model chatbot that runs entirely inside your web browser, meaning no server is required and your conversation data never leaves your device. It focuses on open-source model support, letting you load families like Llama and Mistral directly in the client for fully local inference. Because everything happens in-browser, it can work offline once models are cached, which is helpful for air-gapped environments or travel. The interface mirrors the modern chat UX you’d expect—streaming responses, markdown, and a clean layout—so there’s no usability tradeoff to gain privacy. Under the hood it uses a web-native inference engine to accelerate model execution with GPU/WebGPU when available, keeping responses responsive even without a backend. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Granite TSFM

    Granite TSFM

    Foundation Models for Time Series

    ...Issues and examples in the tracker illustrate common tasks such as slicing inference windows or using pipeline helpers that return pandas DataFrames, grounding the library in day-to-day time-series operations. The ecosystem around TSFM also includes a community cookbook of “recipes” that showcase capabilities and patterns. Overall, the repo is designed as a hands-on companion for teams adopting time-series foundation models in production-leaning settings.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    ...The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or post-processing modules (e.g. mesh smoothing, texturing) to make the outputs more output-ready. Because 3D generation is hardware‐intensive, the repository likely also includes optimizations like quantization, pruning, or inference accelerations (e.g. using FlashMLA or DeepEP) to make the generation pipeline faster or more efficient. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    DALI

    DALI

    A GPU-accelerated library containing highly optimized building blocks

    ...Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference. DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and linguistic patterns to produce candidate reconstructions. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 18
    DeepSeek VL

    DeepSeek VL

    Towards Real-World Vision-Language Understanding

    ...The repository includes model weights (or pointers to them), evaluation metrics on standard vision + language benchmarks, and configuration or architecture files. It also supports inference tools for forwarding image + prompt through the model to produce text output. DeepSeek-VL is a predecessor to their newer VL2 model, and presumably shares core design philosophy but with earlier scaling, fewer enhancements, or capability tradeoffs.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 19
    CogVideo

    CogVideo

    Text and image to video generation: CogVideoX and CogVideo

    ...The latest CogVideoX models offer higher resolution outputs, longer video durations, and improved controllability through prompt engineering. The project includes tools for inference, fine-tuning, and optimization, making it suitable for both research and production use. It supports efficient deployment on a range of GPUs, including consumer hardware with quantization techniques. Overall, CogVideo provides a powerful framework for generating high-quality AI videos and experimenting with cutting-edge multimodal AI systems.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 20
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    ...It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 21
    Book5_Essentials-Probability-Statistics

    Book5_Essentials-Probability-Statistics

    The book 5 of statistics in simplicity

    Book5_Essentials-of-Probability-and-Statistics is a Visualize-ML educational volume that introduces the statistical and probabilistic concepts underpinning modern data analysis and machine learning. The repository explains topics such as distributions, sampling, inference, and uncertainty using visual demonstrations and intuitive narratives. Its teaching philosophy prioritizes conceptual clarity over heavy formalism, making statistical thinking more approachable for beginners. The material connects probability theory directly to real analytical workflows, helping learners understand how statistics supports predictive modeling. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    LLaMA 3

    LLaMA 3

    The official Meta Llama 3 GitHub site

    ...Even as a deprecated repo, it documents the transition path and preserves references that clarify how Llama 3 releases map into the current ecosystem. Practically, it functioned as a bridge between Llama 2 and later Llama releases by standardizing distribution and starter code for inference and fine-tuning. Teams still treat it as historical reference material for version lineage and migration notes.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 23
    DragonianVoice

    DragonianVoice

    C++ inference library for multiple SVC/TTS

    ...It uses ONNX Runtime and other backends to accelerate inference, with notes on how different execution providers such as CUDA or DirectML affect operator support and numerical stability. Recent versions integrate with fish-speech via a dedicated fish-speech.cpp subproject using ggml.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Hunyuan3D-2.1

    Hunyuan3D-2.1

    From Images to High-Fidelity 3D Assets

    Hunyuan3D-2.1 is Tencent Hunyuan’s advanced 3D asset generation system that produces high-fidelity 3D models with Physically Based Rendering (PBR) textures. It is fully open-source with released model weights, training, and inference code. It improves on prior versions by using a PBR texture pipeline (enabling realistic material effects like reflections and subsurface scattering) and allowing community fine-tuning and extension. It supports both shape generation (mesh geometry) and texture generation modules. Physically Based Rendering texture synthesis to model realistic material effects, including reflections, subsurface scattering, etc. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 25
    Rapid LaTeX OCR

    Rapid LaTeX OCR

    Formula recognition based on LaTeX-OCR and ONNXRuntime

    Formula recognition based on LaTeX-OCR and ONNXRuntime. rapid_latex_ocr is a tool to convert formula images to latex format. The reasoning code in the repo is modified from LaTeX-OCR, the model has all been converted to ONNX format, and the reasoning code has been simplified, Inference is faster and easier to deploy. The repo only has codes based on ONNXRuntime or OpenVINO inference in onnx format and does not contain training model codes. If you want to train your own model, please move to LaTeX-OCR. When installing the package through pip, the model file will be automatically downloaded and placed under models in the installation directory.
    Downloads: 5 This Week
    Last Update:
    See Project