| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| vllm-0.18.1.tar.gz | 2026-03-31 | 30.8 MB | |
| vllm-0.18.1-cp38-abi3-manylinux_2_31_x86_64.whl | 2026-03-31 | 433.2 MB | |
| vllm-0.18.1+cpu-cp38-abi3-manylinux_2_35_aarch64.whl | 2026-03-31 | 33.2 MB | |
| vllm-0.18.1+cpu-cp38-abi3-manylinux_2_35_x86_64.whl | 2026-03-31 | 71.4 MB | |
| vllm-0.18.1+cu130-cp38-abi3-manylinux_2_35_aarch64.whl | 2026-03-31 | 214.0 MB | |
| vllm-0.18.1+cu130-cp38-abi3-manylinux_2_35_x86_64.whl | 2026-03-31 | 228.3 MB | |
| vllm-0.18.1-cp38-abi3-manylinux_2_31_aarch64.whl | 2026-03-31 | 385.6 MB | |
| README.md | 2026-03-30 | 453 Bytes | |
| v0.18.1 source code.tar.gz | 2026-03-30 | 30.7 MB | |
| v0.18.1 source code.zip | 2026-03-30 | 33.5 MB | |
| Totals: 10 Items | 1.5 GB | 13 | |
This is a patch release on top of v0.18.0 to address a few issues:
- Change default SM100 MLA prefill backend back to TRT-LLM (#38562)
- Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158)
- Disable monolithic TRTLLM MoE for Renormalize routing [#37605]
- Pre-download missing FlashInfer headers in Docker build [#38391]
- Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083)