Garnet is a remote cache-store from Microsoft Research
Running large language models on a single GPU
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation
Techniques and numbers for estimating system's performance
Fast JSON parser and validator for Go
950 line, minimal, extensible LLM inference engine built from scratch
Modern Load Testing as Code
Shardeum is an EVM based autoscaling blockchain
AI memory OS for LLM and Agent systems
Concurrent and multi-stage data ingestion and data processing
Deep learning optimization library: makes distributed training easy
A user-space file system for interacting with Google Cloud Storage
Minimal Python framework for scalable AI inference servers fast
High-performance inference server for text embeddings models API layer
Parallax is a distributed model serving framework
CoreNet: A library for training deep neural networks
Alibaba's high-performance LLM inference engine for diverse apps
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
Node.js bindings for librdkafka
Probably the fastest PHP web framework in the world
Open-source large language model family from Tencent Hunyuan
Block Diffusion for Ultra-Fast Speculative Decoding
C++-based high-performance parallel environment execution engine
Java enterprise application development framework