| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| vllm-0.20.2.tar.gz | 2026-05-10 | 33.5 MB | |
| vllm-0.20.2-cp38-abi3-manylinux_2_35_x86_64.whl | 2026-05-10 | 244.4 MB | |
| vllm-0.20.2+cpu-cp38-abi3-manylinux_2_35_aarch64.whl | 2026-05-10 | 36.0 MB | |
| vllm-0.20.2+cpu-cp38-abi3-manylinux_2_35_x86_64.whl | 2026-05-10 | 75.8 MB | |
| vllm-0.20.2+cu129-cp38-abi3-manylinux_2_31_aarch64.whl | 2026-05-10 | 422.2 MB | |
| vllm-0.20.2+cu129-cp38-abi3-manylinux_2_31_x86_64.whl | 2026-05-10 | 455.1 MB | |
| vllm-0.20.2-cp38-abi3-manylinux_2_35_aarch64.whl | 2026-05-10 | 235.8 MB | |
| README.md | 2026-05-08 | 922 Bytes | |
| v0.20.2 source code.tar.gz | 2026-05-08 | 33.4 MB | |
| v0.20.2 source code.zip | 2026-05-08 | 36.5 MB | |
| Totals: 10 Items | 1.6 GB | 0 | |
vLLM v0.20.2
Highlights
This release features 6 commits from 6 contributors (0 new)!
This is a small patch release with bug fixes for DeepSeek V4, gpt-oss, and Qwen3-VL
Bug Fixes
- DeepSeek V4 sparse attention: Re-enable the persistent topk path on Hopper and ensure the memset kernel runs at CUDA graph capture time regardless of
max_seq_len, fixing the MTP=1 hang on DeepSeek V4 (#41665, revert of [#41605]). - DeepSeek V4 KV cache: Fixed a "failure to allocate KV blocks" error in the V1 engine KV cache manager (#41282).
- gpt-oss MXFP4 + torch.compile: Plumbed
hidden_dim_unpaddedthrough themoe_forwardfake op so MXFP4 works undertorch.compileon v0.20.x (#42002, backport of [#41646]). - Qwen3-VL: Removed an invalid deepstack boundary check that could fail under heavy load (#40932).
Contributors
@ywang96, @zyongye, @stecasta, @wzhao18, @Isotr0py, @khluu