Running large language models on a single GPU
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation
950 line, minimal, extensible LLM inference engine built from scratch
AI memory OS for LLM and Agent systems
Deep learning optimization library: makes distributed training easy
Minimal Python framework for scalable AI inference servers fast
High-performance inference server for text embeddings models API layer
Parallax is a distributed model serving framework
Alibaba's high-performance LLM inference engine for diverse apps
FlashMLA: Efficient Multi-head Latent Attention Kernels
Open-source large language model family from Tencent Hunyuan
Block Diffusion for Ultra-Fast Speculative Decoding
C++-based high-performance parallel environment execution engine
Lightning-fast, on-device TTS, running natively via ONNX
A TTS that fits in your CPU (and pocket)
Java enterprise application development framework
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
A high-performance inference engine for AI models
A simple, performant and scalable Jax LLM
Official inference framework for 1-bit LLMs
Accurate × Fast × Comprehensive
Towards Human-Sounding Speech
A AI-Driven, Distributed and high-performance monitoring system
An efficient forwarding service designed for LLMs
A lightweight, lightning-fast, in-process vector database