VLLM
A high-throughput and memory-efficient inference and serving engine
vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.