Download Latest Version v0.19.0 source code.tar.gz (31.0 MB)
Email in envelope

Get an email when there's a new version of vLLM

Home / v0.18.1
Name Modified Size InfoDownloads / Week
Parent folder
vllm-0.18.1.tar.gz 2026-03-31 30.8 MB
vllm-0.18.1-cp38-abi3-manylinux_2_31_x86_64.whl 2026-03-31 433.2 MB
vllm-0.18.1+cpu-cp38-abi3-manylinux_2_35_aarch64.whl 2026-03-31 33.2 MB
vllm-0.18.1+cpu-cp38-abi3-manylinux_2_35_x86_64.whl 2026-03-31 71.4 MB
vllm-0.18.1+cu130-cp38-abi3-manylinux_2_35_aarch64.whl 2026-03-31 214.0 MB
vllm-0.18.1+cu130-cp38-abi3-manylinux_2_35_x86_64.whl 2026-03-31 228.3 MB
vllm-0.18.1-cp38-abi3-manylinux_2_31_aarch64.whl 2026-03-31 385.6 MB
README.md 2026-03-30 453 Bytes
v0.18.1 source code.tar.gz 2026-03-30 30.7 MB
v0.18.1 source code.zip 2026-03-30 33.5 MB
Totals: 10 Items   1.5 GB 13

This is a patch release on top of v0.18.0 to address a few issues:

  • Change default SM100 MLA prefill backend back to TRT-LLM (#38562)
  • Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158)
  • Disable monolithic TRTLLM MoE for Renormalize routing [#37605]
  • Pre-download missing FlashInfer headers in Docker build [#38391]
  • Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083)
Source: README.md, updated 2026-03-30