Garnet is a remote cache-store from Microsoft Research
Running large language models on a single GPU
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation
Techniques and numbers for estimating system's performance
Fast JSON parser and validator for Go
950 line, minimal, extensible LLM inference engine built from scratch
Modern Load Testing as Code
AI memory OS for LLM and Agent systems
Shardeum is an EVM based autoscaling blockchain
Concurrent and multi-stage data ingestion and data processing
Deep learning optimization library: makes distributed training easy
A user-space file system for interacting with Google Cloud Storage
Minimal Python framework for scalable AI inference servers fast
High-performance inference server for text embeddings models API layer
Parallax is a distributed model serving framework
CoreNet: A library for training deep neural networks
Alibaba's high-performance LLM inference engine for diverse apps
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
Cross-platform multi-protocol VPN software
Node.js bindings for librdkafka
Probably the fastest PHP web framework in the world
Open-source large language model family from Tencent Hunyuan
Block Diffusion for Ultra-Fast Speculative Decoding
Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX