Download Latest Version v0.22.0 source code.tar.gz (36.1 MB)
Email in envelope

Get an email when there's a new version of vLLM

Home / v0.20.2
Name Modified Size InfoDownloads / Week
Parent folder
vllm-0.20.2.tar.gz 2026-05-10 33.5 MB
vllm-0.20.2-cp38-abi3-manylinux_2_35_x86_64.whl 2026-05-10 244.4 MB
vllm-0.20.2+cpu-cp38-abi3-manylinux_2_35_aarch64.whl 2026-05-10 36.0 MB
vllm-0.20.2+cpu-cp38-abi3-manylinux_2_35_x86_64.whl 2026-05-10 75.8 MB
vllm-0.20.2+cu129-cp38-abi3-manylinux_2_31_aarch64.whl 2026-05-10 422.2 MB
vllm-0.20.2+cu129-cp38-abi3-manylinux_2_31_x86_64.whl 2026-05-10 455.1 MB
vllm-0.20.2-cp38-abi3-manylinux_2_35_aarch64.whl 2026-05-10 235.8 MB
README.md 2026-05-08 922 Bytes
v0.20.2 source code.tar.gz 2026-05-08 33.4 MB
v0.20.2 source code.zip 2026-05-08 36.5 MB
Totals: 10 Items   1.6 GB 0

vLLM v0.20.2

Highlights

This release features 6 commits from 6 contributors (0 new)!

This is a small patch release with bug fixes for DeepSeek V4, gpt-oss, and Qwen3-VL

Bug Fixes

  • DeepSeek V4 sparse attention: Re-enable the persistent topk path on Hopper and ensure the memset kernel runs at CUDA graph capture time regardless of max_seq_len, fixing the MTP=1 hang on DeepSeek V4 (#41665, revert of [#41605]).
  • DeepSeek V4 KV cache: Fixed a "failure to allocate KV blocks" error in the V1 engine KV cache manager (#41282).
  • gpt-oss MXFP4 + torch.compile: Plumbed hidden_dim_unpadded through the moe_forward fake op so MXFP4 works under torch.compile on v0.20.x (#42002, backport of [#41646]).
  • Qwen3-VL: Removed an invalid deepstack boundary check that could fail under heavy load (#40932).

Contributors

@ywang96, @zyongye, @stecasta, @wzhao18, @Isotr0py, @khluu

Source: README.md, updated 2026-05-08