FlashInfer: Kernel Library for LLM Serving
MII makes low-latency and high-throughput inference possible
Optimizing inference proxy for LLMs
Deep learning optimization library: makes distributed training easy
Low-latency REST API for serving text-embeddings
Easiest and laziest way for building multi-agent LLMs applications
Powering Amazon custom machine learning chips
A toolkit to optimize ML models for deployment for Keras & TensorFlow
CPU/GPU inference server for Hugging Face transformer models