122 projects for "throughput" with 1 filter applied:

  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Garnet

    Garnet

    Garnet is a remote cache-store from Microsoft Research

    Garnet is a remote cache‑store developed by Microsoft Research. It delivers high throughput and low‑latency performance, supports scalability via clustering (sharding, replication, key migration, checkpointing, failover, transactions), and seamlessly integrates with existing Redis clients. Garnet offers much better throughput and scalability with many client connections and small batches, relative to comparable open-source cache-stores, leading to cost savings for large apps and services. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    ...This design allows organizations to deploy powerful language models for high-volume tasks without the infrastructure costs typically associated with large-scale AI systems. The project is particularly useful for workloads that prioritize throughput over latency, including benchmarking experiments and large corpus analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    MiMo-V2-Flash

    MiMo-V2-Flash

    MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation

    ...It uses an MoE setup where a very large total parameter count is available, but only a smaller subset is activated per token, which helps balance capability with runtime efficiency. The project positions the model for workflows that require tool use, multi-step planning, and higher throughput, rather than only single-turn chat. Architecturally, it highlights attention and prediction choices aimed at accelerating generation while preserving instruction-following quality in complex prompts. The repository typically serves as a launch point for running the model, understanding its intended use cases, and reproducing or extending its evaluation on reasoning and agent-style tasks. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Napkin Math

    Napkin Math

    Techniques and numbers for estimating system's performance

    ...It collects practical numbers, benchmark-style measurements, and mental models that help engineers make fast back-of-the-envelope calculations. The project is useful for questions like how much memory throughput matters, how long storage operations may take, what network latency to expect, or how expensive logging could become at high request volume. It treats these values as rounded numbers for reasoning rather than exact performance guarantees. The repository is especially useful for system design interviews, architecture planning, capacity estimation, and infrastructure cost discussions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 5
    Fast JSON

    Fast JSON

    Fast JSON parser and validator for Go

    ...The project provides a low-level API that allows developers to work directly with JSON structures without converting them into intermediate representations. Its design prioritizes minimal overhead and maximum throughput, making it suitable for performance-critical applications such as APIs, data pipelines, and real-time systems. fastjson also supports both parsing and serialization, offering flexibility in data handling. Overall, it is a specialized tool for developers who need fine-grained control over JSON processing performance.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    SimpleLLM

    SimpleLLM

    950 line, minimal, extensible LLM inference engine built from scratch

    ...Designed to run efficiently on high-end GPUs like NVIDIA H100 with support for models such as OpenAI/gpt-oss-120b, Simple-LLM implements continuous batching and event-driven inference loops to maximize hardware utilization and throughput. Its straightforward code structure allows anyone experimenting with custom kernels, new batching strategies, or inference optimizations to trace execution from input to output with minimal cognitive overhead.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Gatling

    Gatling

    Modern Load Testing as Code

    ...Gatling supports HTTP out of the box as well as WebSocket, Server-Sent Events, and JMS, so you can exercise modern, real-time systems end to end. Rich HTML reports visualize percentiles, response time distributions, errors, and throughput, making bottlenecks and regressions easy to spot. With injection profiles (ramp, constant, spikes) and pass/fail gates, you can automate performance thresholds in CI and promote builds with confidence.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 8
    Shardeum

    Shardeum

    Shardeum is an EVM based autoscaling blockchain

    Shardeum is an EVM‑compatible layer‑1 blockchain platform that leverages dynamic state sharding to deliver linear scalability, consistently low transaction fees, strong decentralization, and high throughput for decentralized application developers. Shardeum is an innovative EVM-compliant blockchain platform that leverages dynamic state sharding to achieve unprecedented scalability. By implementing a sharding model, Shardeum ensures faster processing times and lower transaction costs without compromising security or decentralization. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    MemOS

    MemOS

    AI memory OS for LLM and Agent systems

    ...By abandoning some of the historical assumptions of Unix-style operating systems, MemOS attempts to unlock new performance and scalability tradeoffs for applications that need high throughput and low latency on memory-intensive workloads.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 10
    Broadway

    Broadway

    Concurrent and multi-stage data ingestion and data processing

    Broadway is a data processing library for Elixir designed to handle high-throughput, concurrent workloads with ease. It provides an abstraction for defining pipelines that consume data from sources like RabbitMQ, Kafka, Amazon SQS, or custom producers. Each pipeline is fault-tolerant and backpressure-aware, ensuring stable throughput even under load. The library integrates seamlessly with GenStage and OTP supervision trees, making it highly resilient in production.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DeepSpeed

    DeepSpeed

    Deep learning optimization library: makes distributed training easy

    DeepSpeed is an easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference. With DeepSpeed you can: 1. Train/Inference dense or sparse models with billions or trillions of parameters 2. Achieve excellent system throughput and efficiently scale to thousands of GPUs 3. Train/Inference on resource constrained GPU systems 4. Achieve unprecedented low latency and high throughput for inference 5. Achieve extreme compression for an unparalleled inference latency and model size reduction with low costs DeepSpeed offers a confluence of system innovations, that has made large scale DL training effective, and efficient, greatly improved ease of use, and redefined the DL training landscape in terms of scale that is possible. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Cloud Storage FUSE

    Cloud Storage FUSE

    A user-space file system for interacting with Google Cloud Storage

    ...The tool is particularly valuable in data-intensive workflows such as machine learning, where large datasets can be accessed on demand without requiring full local downloads. It supports performance optimizations like file caching, which stores frequently accessed data on local storage to significantly improve throughput and reduce latency. The system integrates with cloud-native environments such as Kubernetes and can be used in distributed architectures where multiple compute nodes access shared datasets.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    LitServe

    LitServe

    Minimal Python framework for scalable AI inference servers fast

    ...Unlike traditional serving tools that enforce rigid abstractions, LitServe focuses on flexibility by letting users control request handling, batching strategies, and output processing directly in Python. LitServe is built on top of FastAPI and extends it with AI-specific optimizations such as efficient multi-worker execution, which can significantly improve throughput. It includes built-in capabilities for batching, streaming responses, and automatic scaling across CPUs and GPUs, enabling high-performance deployments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Text Embeddings Inference

    Text Embeddings Inference

    High-performance inference server for text embeddings models API layer

    ...It provides an API interface that allows developers to integrate embedding capabilities into applications without managing model internals directly. Text Embeddings Inference is optimized for throughput and low latency, enabling it to handle large volumes of requests reliably. It also emphasizes ease of deployment, often using containerization and configurable runtime options to adapt to different infrastructure setups.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Parallax

    Parallax

    Parallax is a distributed model serving framework

    ...A two-stage scheduling architecture determines how model layers are allocated to available hardware and how requests are routed across nodes during execution. This scheduling system optimizes latency, throughput, and hardware utilization even when nodes have different computational capabilities. The platform also supports model sharding and pipeline parallelism, allowing very large models to run across distributed resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CoreNet

    CoreNet

    CoreNet: A library for training deep neural networks

    ...CoreNet provides abstractions for data, tensor, and pipeline parallelism, allowing models to scale without code duplication or heavy manual configuration. Its distributed runtime manages synchronization, load balancing, and mixed-precision computation to maximize throughput while minimizing communication bottlenecks. CoreNet integrates tightly with Apple’s proprietary ML stack and hardware, serving as the foundation for research in computer vision, language models, and multimodal systems within Apple AI. The framework includes monitoring tools, fault tolerance mechanisms, and efficient checkpointing for massive training runs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    RTP-LLM

    RTP-LLM

    Alibaba's high-performance LLM inference engine for diverse apps

    RTP-LLM is an open-source large language model inference acceleration engine developed by Alibaba to provide high-performance serving infrastructure for modern LLM deployments. The system focuses on improving throughput, latency, and resource utilization when running large models in production environments. It achieves this by implementing optimized GPU kernels, batching strategies, and memory management techniques tailored for transformer inference workloads. The framework is designed for large-scale AI services and is already used internally across several Alibaba platforms such as Taobao, Amap, and other business systems that rely on conversational or search-related AI services. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DeepEP

    DeepEP

    DeepEP: an efficient expert-parallel communication library

    DeepEP is a communication library designed specifically to support Mixture-of-Experts (MoE) and expert parallelism (EP) deployments. Its core role is to implement high-throughput, low-latency all-to-all GPU communication kernels, which handle the dispatching of tokens to different experts (or shards) and then combining expert outputs back into the main data flow. Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP addresses that by providing optimized GPU kernels and efficient dispatch/combining logic. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    FlashMLA

    FlashMLA

    FlashMLA: Efficient Multi-head Latent Attention Kernels

    FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to efficiently manage memory during decoding. On very compute-bound settings, it can reach up to ~660 TFLOPS on H800 SXM5 hardware, while in memory-bound configurations it can push memory throughput to ~3000 GB/s. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Node RDKafka

    Node RDKafka

    Node.js bindings for librdkafka

    A high-performance Node.js client for Apache Kafka, built on top of librdkafka, providing bindings for efficient Kafka message processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Webman

    Webman

    Probably the fastest PHP web framework in the world

    ...It leverages PHP’s multi-process architecture to handle asynchronous HTTP requests efficiently, making it suitable for real-time applications, APIs, and microservices. Unlike traditional synchronous frameworks, Webman achieves low latency and high throughput by using asynchronous I/O, significantly improving performance in scenarios requiring concurrent connections.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Tencent-Hunyuan-Large

    Tencent-Hunyuan-Large

    Open-source large language model family from Tencent Hunyuan

    Tencent-Hunyuan-Large is the flagship open-source large language model family from Tencent Hunyuan, offering both pre-trained and instruct (fine-tuned) variants. It is designed with long-context capabilities, quantization support, and high performance on benchmarks across general reasoning, mathematics, language understanding, and Chinese / multilingual tasks. It aims to provide competitive capability with efficient deployment and inference. FP8 quantization support to reduce memory usage...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    DFlash

    DFlash

    Block Diffusion for Ultra-Fast Speculative Decoding

    DFlash is an open-source framework for ultra-fast speculative decoding using a lightweight block diffusion model to draft text in parallel with a target large language model, dramatically improving inference speed without sacrificing generation quality. It acts as a “drafter” that proposes likely continuations which the main model then verifies, enabling significant throughput gains compared to traditional autoregressive decoding methods that generate token by token. This approach has been shown to deliver lossless acceleration on models like Qwen3-8B by combining block diffusion techniques with efficient batching, making it ideal for applications where latency and cost matter. The project includes support for multiple draft models, example integration code, and scripts to benchmark performance, and it is structured to work with popular model serving stacks like SGLang and the Hugging Face Transformers ecosystem.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    EnvPool

    EnvPool

    C++-based high-performance parallel environment execution engine

    EnvPool is a fast, asynchronous, and parallel RL environment library designed for scaling reinforcement learning experiments. Developed by SAIL at Singapore, it leverages C++ backend and Python frontend for extremely high-speed environment interaction, supporting thousands of environments running in parallel on a single machine. It's compatible with Gymnasium API and RLlib, making it suitable for scalable training pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Solon

    Solon

    Java enterprise application development framework

    Solon is a full-scenario Java enterprise application framework that positions itself as a lean, high-performance alternative to heavy stacks. It advertises large concurrency gains, lower memory use, much faster startup, and dramatically smaller packages while remaining compatible from Java 8 through Java 24. The framework focuses on restrained APIs and an open ecosystem, with modules that cover web, data, cloud, and microservice patterns. Its messaging emphasizes “replaceable Spring”...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB