The official Python client for the Huggingface Hub
FlashInfer: Kernel Library for LLM Serving
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Bring the notion of Model-as-a-Service to life
Optimizing inference proxy for LLMs
The Triton Inference Server provides an optimized cloud
20+ high-performance LLMs with recipes to pretrain, finetune at scale
MII makes low-latency and high-throughput inference possible
Deep learning optimization library: makes distributed training easy
LLMFlows - Simple, Explicit and Transparent LLM Apps
Framework for Accelerating LLM Generation with Multiple Decoding Heads