Audience

AI infrastructure engineers looking for a solution to optimize the deployment and serving of large-scale language models in production environments

About VLLM

VLLM is a high-performance library designed to facilitate efficient inference and serving of Large Language Models (LLMs). Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry. It offers state-of-the-art serving throughput by efficiently managing attention key and value memory through its PagedAttention mechanism. It supports continuous batching of incoming requests and utilizes optimized CUDA kernels, including integration with FlashAttention and FlashInfer, to enhance model execution speed. Additionally, vLLM provides quantization support for GPTQ, AWQ, INT4, INT8, and FP8, as well as speculative decoding capabilities. Users benefit from seamless integration with popular Hugging Face models, support for various decoding algorithms such as parallel sampling and beam search, and compatibility with NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs, and more.

Integrations

API:
Yes, VLLM offers API access

Ratings/Reviews

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Company Information

VLLM
United States
docs.vllm.ai/en/latest/

Videos and Screen Captures

VLLM Screenshot 1
Other Useful Business Software
Our Free Plans just got better! | Auth0 Icon
Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now

Product Details

Platforms Supported
Cloud
Training
Documentation
Support
24/7 Live Support
Online

VLLM Frequently Asked Questions

Q: What kinds of users and organization types does VLLM work with?
Q: What languages does VLLM support in their product?
Q: What kind of support options does VLLM offer?
Q: What other applications or services does VLLM integrate with?
Q: Does VLLM have an API?
Q: What type of training does VLLM provide?

VLLM Product Features