Showing 88 open source projects for "throughput"

View related business solutions
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    MagicAPI AI Gateway

    MagicAPI AI Gateway

    Built for demanding AI workflows

    The world's fastest AI Gateway proxy, written in Rust and optimized for maximum performance. This high-performance API gateway routes requests to various AI providers (OpenAI, GROQ) with streaming support, making it perfect for developers who need reliable and blazing-fast AI API access.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    EnvPool

    EnvPool

    C++-based high-performance parallel environment execution engine

    EnvPool is a fast, asynchronous, and parallel RL environment library designed for scaling reinforcement learning experiments. Developed by SAIL at Singapore, it leverages C++ backend and Python frontend for extremely high-speed environment interaction, supporting thousands of environments running in parallel on a single machine. It's compatible with Gymnasium API and RLlib, making it suitable for scalable training pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Supertonic

    Supertonic

    Lightning-fast, on-device TTS, running natively via ONNX

    ...The core model is highly compact at around 66 million parameters, yet benchmarks show it can generate speech up to 167× faster than real time on modern consumer hardware and significantly outpace popular cloud TTS APIs in throughput and real-time factor. Supertonic is designed to handle real-world text gracefully, including numbers, dates, currency symbols, abbreviations, and technical units, without requiring heavy pre-processing or custom text normalization. The repository provides complete reference implementations across many programming ecosystems—Python, Node.js, browser (WebGPU/WASM), Java, C++, C#, Go, Swift, iOS, Rust, and Flutter.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    Modular Platform

    Modular Platform

    The Modular Platform (includes MAX & Mojo)

    ...It is closely associated with the Mojo programming language and related tooling that aims to combine Python usability with systems-level performance. Modular’s ecosystem is designed to simplify deployment of AI workloads across heterogeneous hardware while maximizing throughput. The repository reflects an effort to modernize the AI development pipeline from compilation to runtime execution. Overall, Modular represents an ambitious attempt to unify performance engineering and developer ergonomics for large-scale AI systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    Solon

    Solon

    Java enterprise application development framework

    Solon is a full-scenario Java enterprise application framework that positions itself as a lean, high-performance alternative to heavy stacks. It advertises large concurrency gains, lower memory use, much faster startup, and dramatically smaller packages while remaining compatible from Java 8 through Java 24. The framework focuses on restrained APIs and an open ecosystem, with modules that cover web, data, cloud, and microservice patterns. Its messaging emphasizes “replaceable Spring”...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    ...The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research preview, and includes data-processing scripts so users can train or finetune their own variants. Inference is provided through a Python package that uses vLLM under the hood for high-throughput, low-latency generation, including streaming examples that show how to generate audio chunks in real time. The maintainers provide Colab notebooks, a standardized prompting format, and one-click deployment via Baseten for production-grade, FP8/FP16 optimized inference with ~200 ms streaming latency.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    ...The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient handling of large datasets. It supports multiple extraction strategies for different document formats, balancing accuracy and throughput depending on the use case. Additionally, it can generate embeddings for extracted content and integrate with vector databases like Milvus, making it well-suited for retrieval-augmented generation pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Qwen-Agent

    Qwen-Agent

    Agent framework and applications built upon Qwen>=3.0

    Qwen-Agent is a framework for building applications / agents using Qwen models (version 3.0+). It provides components for instruction following, tool usage (function calling), planning, memory, RAG (retrieval augmented generation), code interpreter, etc. It ships with example applications (Browser Assistant, Code Interpreter, Custom Assistant), supports GUI front-ends, backends, server setups. Agent workflow can maintain context / memory to perform multi-turn or more complex logic over time....
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    uzu

    uzu

    A high-performance inference engine for AI models

    ...The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. By utilizing Apple’s unified memory architecture, uzu reduces memory copying overhead and improves inference throughput for local AI workloads. The system includes a simple high-level API that enables developers to run models, create inference sessions, and generate outputs with minimal configuration.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    bitnet.cpp

    bitnet.cpp

    Official inference framework for 1-bit LLMs

    ...BitNet is built to scale across architectures, with configurable kernels and tiling strategies that adapt to different hardware, and it supports large models with impressive throughput even on modest resources.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B),...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Edgee

    Edgee

    AI gateway with token compression for Claude Code, Codex, and more

    Edgee is an edge-native execution platform designed to run AI-driven logic and data processing directly at the network edge, reducing latency and improving responsiveness for modern applications. It enables developers to deploy functions and workflows closer to users, allowing real-time processing without relying heavily on centralized cloud infrastructure. The platform is built to support event-driven architectures, where actions are triggered by incoming requests, user behavior, or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    EFAK-AI

    EFAK-AI

    A AI-Driven, Distributed and high-performance monitoring system

    EFAK (Eagle For Apache Kafka) is an open-source monitoring and management platform designed to provide comprehensive visibility and operational control over Apache Kafka clusters through a unified web interface. The project focuses on simplifying Kafka administration by offering real-time insights into cluster health, performance metrics, and consumer activity, allowing engineers to quickly diagnose issues and optimize system behavior. It integrates advanced features such as intelligent...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    TensorRT LLM

    TensorRT LLM

    TensorRT LLM provides users with an easy-to-use Python API

    ...It provides a Python-based API built on top of PyTorch that allows developers to define, customize, and deploy LLMs efficiently across a variety of hardware configurations, from single GPUs to large multi-node clusters. The library focuses on maximizing throughput and minimizing latency through advanced techniques such as quantization, custom attention kernels, and optimized memory management strategies. It includes support for cutting-edge inference methods like speculative decoding and inflight batching, enabling real-time and large-scale AI applications. TensorRT-LLM integrates seamlessly with NVIDIA’s broader inference ecosystem, including Triton Inference Server and distributed deployment frameworks, making it suitable for production environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    OpenAI Forward

    OpenAI Forward

    An efficient forwarding service designed for LLMs

    ...The project can proxy both local and cloud-hosted language model services, which makes it useful for teams that want a single control layer regardless of whether they are using something like LocalAI or a hosted provider compatible with OpenAI-style APIs. A major emphasis of the repository is asynchronous performance, using tools such as uvicorn, aiohttp, and asyncio to support high-throughput forwarding workloads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Zvec

    Zvec

    A lightweight, lightning-fast, in-process vector database

    ...Developed by Alibaba’s Tongyi Lab, it positions itself as the “SQLite of vector databases” by being easy to integrate, minimal in dependencies, and capable of handling high throughput with low latency on edge devices or small systems. Zvec excels at approximate nearest neighbor search and retrieval tasks that power features like semantic search, recommendation systems, and retrieval-augmented generation (RAG) setups. Its performance benchmarks show it achieving high queries-per-second and fast index build times compared to similar tools. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Loki Mode

    Loki Mode

    Multi-agent autonomous startup system for Claude Code

    ...By supporting multiple AI providers (like Claude Code, OpenAI Codex CLI, and Google Gemini CLI), loki-mode dynamically selects and spawns only the needed agents for a given project, optimizing computational resources and task throughput. Its Reason-Act-Reflect-Verify (RARV) cycle with self-verification loops emphasizes quality and resilience, automating end-to-end development lifecycles.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    OpenMLDB

    OpenMLDB

    OpenMLDB is an open-source machine learning database

    ...However, a feature engineering script developed by data scientists (Python scripts in most cases) cannot be directly deployed into production for online inference because it usually cannot meet the engineering requirements, such as low latency, high throughput and high availability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Stable Diffusion WebUI Forge

    Stable Diffusion WebUI Forge

    Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion

    Stable Diffusion WebUI Forge is a performance- and feature-oriented fork of the popular AUTOMATIC1111 interface that experiments with new backends, memory optimizations, and UX improvements. It targets heavy users and researchers who push large models, control nets, and high-resolution pipelines where default settings can become bottlenecks. The fork typically introduces toggles for scheduler behavior, attention implementations, caching, and precision modes to reach better speed or quality...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    trench

    trench

    Open-Source Analytics Infrastructure

    Trench is an open-source analytics infrastructure designed for tracking events and performing real-time analysis of application data at scale. The system is built on top of high-performance data technologies including Apache Kafka and ClickHouse, which allows it to ingest and process very large volumes of events while maintaining fast query performance. It was originally developed to solve scaling challenges in product analytics systems where traditional relational databases become...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MaxText

    MaxText

    A simple, performant and scalable Jax LLM

    ...It is optimized to run efficiently on Google Cloud TPUs and GPUs, enabling researchers and engineers to train models ranging from small experiments to extremely large distributed workloads. The framework focuses on simplicity while still supporting advanced techniques such as model sharding, distributed computation, and high-throughput training pipelines. MaxText includes ready-to-use configurations and reproducible training examples that help developers understand how to deploy large-scale AI workloads with modern machine learning infrastructure.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    FastDeploy

    FastDeploy

    High-performance Inference and Deployment Toolkit for LLMs and VLMs

    ...The platform enables developers to deploy trained models quickly using optimized inference pipelines that support GPUs, specialized AI accelerators, and other hardware architectures. FastDeploy includes advanced acceleration technologies such as speculative decoding, multi-token prediction, and efficient KV cache management to improve throughput and latency during inference. It also offers compatibility with OpenAI-style APIs and vLLM-like interfaces, allowing developers to integrate deployed models easily into existing applications and services.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    slime LLM

    slime LLM

    slime is an LLM post-training framework for RL Scaling

    slime is an open-source large language model (LLM) post-training framework developed to support reinforcement learning (RL)-based scaling and high-performance training workflows for advanced LLMs, blending training and rollout modules into an extensible system. It offers a flexible architecture that connects high-throughput training (e.g., via Megatron-LM) with a customizable data generation pipeline, enabling researchers and engineers to iterate on new RL training paradigms effectively. The framework is designed to support a wide range of training modes, allowing both synchronous and asynchronous RL workflows and programmable rollout interfaces that simplify experimentation with custom environments and reward signals. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    OpenVINO Model Server

    OpenVINO Model Server

    A scalable inference server for models optimized with OpenVINO

    ...It’s implemented in C++ for scalability and efficiency, making it suitable for both edge and cloud deployments where inference workloads must be reliable and high throughput. The server exposes model inference via standard network protocols like REST and gRPC, allowing any client that speaks those protocols to request predictions remotely, abstracting away the complexity of where and how the model runs. It supports model deployment in diverse environments including Docker, bare-metal machines, and Kubernetes clusters, and is especially useful in microservices architectures where AI services need to scale independently. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    nanochat

    nanochat

    The best ChatGPT that $100 can buy

    nanochat is a from-scratch, end-to-end “mini ChatGPT” that shows the entire path from raw text to a chatty web app in one small, dependency-lean codebase. The repository stitches together every stage of the lifecycle: tokenizer training, pretraining a Transformer on a large web corpus, mid-training on dialogue and multiple-choice tasks, supervised fine-tuning, optional reinforcement learning for alignment, and finally efficient inference with caching. Its north star is approachability and...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB