throughput free download

Showing 432 open source projects for "throughput"

View related business solutions

Linux Clear Filters & Widen Search

Stop Storing Third-Party Tokens in Your Database
Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.

Try Auth0 for Free
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

vLLM

A high-throughput and memory-efficient inference and serving engine

vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.

Downloads: 12 This Week

Last Update: 4 days ago
See Project
2

ntopng

Web-based Traffic and Security Network Traffic Monitoring

ntopng® is a web-based network traffic monitoring application released under GPLv3. It is the new incarnation of the original ntop written in 1998, and is now revamped in terms of performance, usability, and features. ntopng is a network traffic probe that provides 360° Network visibility, with its ability to gather traffic information from traffic mirrors, NetFlow exporters, SNMP devices, Firewall logs, and Intrusion Detection systems. ntopng has been written in a portable way in order to...

Downloads: 38 This Week

Last Update: 2025-11-17
See Project
3

Garnet

Garnet is a remote cache-store from Microsoft Research

Garnet is a remote cache‑store developed by Microsoft Research. It delivers high throughput and low‑latency performance, supports scalability via clustering (sharding, replication, key migration, checkpointing, failover, transactions), and seamlessly integrates with existing Redis clients. Garnet offers much better throughput and scalability with many client connections and small batches, relative to comparable open-source cache-stores, leading to cost savings for large apps and services. ...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
4

ScaleLLM

A high-performance inference system for large language models

ScaleLLM is a high-performance inference system tailored for Large Language Models (LLMs), specifically designed for production environments. It focuses on optimizing inference processes to handle large-scale deployments efficiently, ensuring low latency and high throughput. ScaleLLM supports various LLM architectures and integrates with existing infrastructures, providing a scalable solution for deploying LLMs in real-world applications.

Downloads: 0 This Week

Last Update: 2025-09-13
See Project
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
5

HikariCP

A solid, high-performance, JDBC connection pool at last

HikariCP is a high-performance JDBC connection pooling library designed to offer minimal overhead and maximum throughput. It aims for simplicity and stability, concentrating on fast connection acquisition and minimal latency under load. Its internals manage a pool of connections aggressively but smartly, with options such as leak detection, timeout thresholds, and adaptive connection retirement. Because it avoids heavy internal locking, it scales well in high-concurrency environments, making it a popular choice in microservices and high-throughput web backends. ...

Downloads: 10 This Week

Last Update: 2025-09-12
See Project
6

ThingsBoard Message Queue (TBMQ)

Open-source, scalable, and fault-tolerant MQTT broker

TBMQ is a lightweight message broker built to support ThingsBoard's IoT platform, focusing on telemetry data streaming and device communication. It uses Kafka-compatible APIs and is optimized for high-throughput messaging, device scalability, and low-latency delivery. TBMQ is ideal for IoT backends needing MQTT or Kafka-style pub/sub infrastructure.

Downloads: 1 This Week

Last Update: 2026-05-11
See Project
7

FlexLLMGen

Running large language models on a single GPU

...This design allows organizations to deploy powerful language models for high-volume tasks without the infrastructure costs typically associated with large-scale AI systems. The project is particularly useful for workloads that prioritize throughput over latency, including benchmarking experiments and large corpus analysis.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
8

MiMo-V2-Flash

MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation

...It uses an MoE setup where a very large total parameter count is available, but only a smaller subset is activated per token, which helps balance capability with runtime efficiency. The project positions the model for workflows that require tool use, multi-step planning, and higher throughput, rather than only single-turn chat. Architecturally, it highlights attention and prediction choices aimed at accelerating generation while preserving instruction-following quality in complex prompts. The repository typically serves as a launch point for running the model, understanding its intended use cases, and reproducing or extending its evaluation on reasoning and agent-style tasks. ...

Downloads: 7 This Week

Last Update: 2026-01-08
See Project
9
$Napkin Math$

Napkin Math

Techniques and numbers for estimating system's performance

...It collects practical numbers, benchmark-style measurements, and mental models that help engineers make fast back-of-the-envelope calculations. The project is useful for questions like how much memory throughput matters, how long storage operations may take, what network latency to expect, or how expensive logging could become at high request volume. It treats these values as rounded numbers for reasoning rather than exact performance guarantees. The repository is especially useful for system design interviews, architecture planning, capacity estimation, and infrastructure cost discussions. ...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
Streamline Azure Security with Palo Alto Networks VM-Series
Centrally manage physical and virtualized firewalls with Panorama

Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.

Learn more
10

Fast JSON

Fast JSON parser and validator for Go

...The project provides a low-level API that allows developers to work directly with JSON structures without converting them into intermediate representations. Its design prioritizes minimal overhead and maximum throughput, making it suitable for performance-critical applications such as APIs, data pipelines, and real-time systems. fastjson also supports both parsing and serialization, offering flexibility in data handling. Overall, it is a specialized tool for developers who need fine-grained control over JSON processing performance.

Downloads: 3 This Week

Last Update: 2026-05-09
See Project
11

TensorRT

C++ library for high performance inference on NVIDIA GPUs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. ...

Downloads: 19 This Week

Last Update: 2026-03-25
See Project
12

SimpleLLM

950 line, minimal, extensible LLM inference engine built from scratch

...Designed to run efficiently on high-end GPUs like NVIDIA H100 with support for models such as OpenAI/gpt-oss-120b, Simple-LLM implements continuous batching and event-driven inference loops to maximize hardware utilization and throughput. Its straightforward code structure allows anyone experimenting with custom kernels, new batching strategies, or inference optimizations to trace execution from input to output with minimal cognitive overhead.

Downloads: 1 This Week

Last Update: 2026-01-28
See Project
13

Gatling

Modern Load Testing as Code

...Gatling supports HTTP out of the box as well as WebSocket, Server-Sent Events, and JMS, so you can exercise modern, real-time systems end to end. Rich HTML reports visualize percentiles, response time distributions, errors, and throughput, making bottlenecks and regressions easy to spot. With injection profiles (ramp, constant, spikes) and pass/fail gates, you can automate performance thresholds in CI and promote builds with confidence.

Downloads: 11 This Week

Last Update: 2026-02-26
See Project
14

MemOS

AI memory OS for LLM and Agent systems

...By abandoning some of the historical assumptions of Unix-style operating systems, MemOS attempts to unlock new performance and scalability tradeoffs for applications that need high throughput and low latency on memory-intensive workloads.

Downloads: 2 This Week

Last Update: 16 hours ago
See Project
15

Shardeum

Shardeum is an EVM based autoscaling blockchain

Shardeum is an EVM‑compatible layer‑1 blockchain platform that leverages dynamic state sharding to deliver linear scalability, consistently low transaction fees, strong decentralization, and high throughput for decentralized application developers. Shardeum is an innovative EVM-compliant blockchain platform that leverages dynamic state sharding to achieve unprecedented scalability. By implementing a sharding model, Shardeum ensures faster processing times and lower transaction costs without compromising security or decentralization. ...

Downloads: 0 This Week

Last Update: 2025-09-12
See Project
16

conflux-rust

The official Rust implementation of Conflux protocol

conflux-rust is the Rust implementation of Conflux, a high-performance Layer 1 blockchain designed to deliver high throughput without compromising decentralization or security. Conflux introduces a Tree-Graph consensus mechanism that allows parallel block processing while maintaining consensus integrity. This implementation focuses on performance and is compatible with the Ethereum Virtual Machine (EVM), enabling developers to deploy smart contracts written for Ethereum.

Downloads: 0 This Week

Last Update: 9 hours ago
See Project
17

Broadway

Concurrent and multi-stage data ingestion and data processing

Broadway is a data processing library for Elixir designed to handle high-throughput, concurrent workloads with ease. It provides an abstraction for defining pipelines that consume data from sources like RabbitMQ, Kafka, Amazon SQS, or custom producers. Each pipeline is fault-tolerant and backpressure-aware, ensuring stable throughput even under load. The library integrates seamlessly with GenStage and OTP supervision trees, making it highly resilient in production.

Downloads: 0 This Week

Last Update: 2026-04-17
See Project
18

DeepSpeed

Deep learning optimization library: makes distributed training easy

DeepSpeed is an easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference. With DeepSpeed you can: 1. Train/Inference dense or sparse models with billions or trillions of parameters 2. Achieve excellent system throughput and efficiently scale to thousands of GPUs 3. Train/Inference on resource constrained GPU systems 4. Achieve unprecedented low latency and high throughput for inference 5. Achieve extreme compression for an unparalleled inference latency and model size reduction with low costs DeepSpeed offers a confluence of system innovations, that has made large scale DL training effective, and efficient, greatly improved ease of use, and redefined the DL training landscape in terms of scale that is possible. ...

Downloads: 1 This Week

Last Update: 2026-05-06
See Project
19

Canal

MySQL binlog

Canal is an open-source project developed by Alibaba that simulates MySQL slave functionality to parse MySQL binlog files. It enables real-time data synchronization and change data capture (CDC) between MySQL and other systems such as Elasticsearch, Kafka, or HBase. Canal is widely used for data integration, replication, and monitoring across distributed systems, offering high performance and low-latency log parsing.

Downloads: 10 This Week

Last Update: 2025-07-18
See Project
20

FlashAttention

Fast and memory-efficient exact attention

...The project provides implementations of FlashAttention, FlashAttention-2, and newer iterations optimized for modern GPU architectures such as NVIDIA Hopper and AMD accelerators. By improving both forward and backward pass efficiency, it enables training and inference of large language models with longer sequence lengths and higher throughput. The library integrates with PyTorch and supports various attention configurations, including causal masking, multi-query attention, and rotary embeddings.

Downloads: 69 This Week

Last Update: 2026-03-18
See Project
21

Cloud Storage FUSE

A user-space file system for interacting with Google Cloud Storage

...The tool is particularly valuable in data-intensive workflows such as machine learning, where large datasets can be accessed on demand without requiring full local downloads. It supports performance optimizations like file caching, which stores frequently accessed data on local storage to significantly improve throughput and reduce latency. The system integrates with cloud-native environments such as Kubernetes and can be used in distributed architectures where multiple compute nodes access shared datasets.

Downloads: 1 This Week

Last Update: 2026-04-30
See Project
22

LitServe

Minimal Python framework for scalable AI inference servers fast

...Unlike traditional serving tools that enforce rigid abstractions, LitServe focuses on flexibility by letting users control request handling, batching strategies, and output processing directly in Python. LitServe is built on top of FastAPI and extends it with AI-specific optimizations such as efficient multi-worker execution, which can significantly improve throughput. It includes built-in capabilities for batching, streaming responses, and automatic scaling across CPUs and GPUs, enabling high-performance deployments.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
23

Text Embeddings Inference

High-performance inference server for text embeddings models API layer

...It provides an API interface that allows developers to integrate embedding capabilities into applications without managing model internals directly. Text Embeddings Inference is optimized for throughput and low latency, enabling it to handle large volumes of requests reliably. It also emphasizes ease of deployment, often using containerization and configurable runtime options to adapt to different infrastructure setups.

Downloads: 0 This Week

Last Update: 2026-03-23
See Project
24

Parallax

Parallax is a distributed model serving framework

...A two-stage scheduling architecture determines how model layers are allocated to available hardware and how requests are routed across nodes during execution. This scheduling system optimizes latency, throughput, and hardware utilization even when nodes have different computational capabilities. The platform also supports model sharding and pipeline parallelism, allowing very large models to run across distributed resources.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
25

Motor

The async Python driver for MongoDB and Tornado or asyncio

...It provides a familiar API surface similar to the official synchronous PyMongo driver, so you can migrate or write MongoDB code in Python without having to learn a completely new interface. Because it integrates with popular async ecosystems like FastAPI, Sanic, and aiohttp, Motor is a natural fit for modern async Python stacks where throughput and responsiveness matter. It also supports change streams, grid file system (GridFS), and the full range of CRUD and aggregation operations available in MongoDB.

Downloads: 0 This Week

Last Update: 2026-01-22
See Project