vLLM Files

A high-throughput and memory-efficient inference and serving engine

This is an exact mirror of the vLLM project, hosted at https://github.com/vllm-project/vllm. SourceForge is not affiliated with vLLM.

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
vllm-0.18.1.tar.gz	2026-03-31	30.8 MB	1
vllm-0.18.1-cp38-abi3-manylinux_2_31_x86_64.whl	2026-03-31	433.2 MB	1
vllm-0.18.1+cpu-cp38-abi3-manylinux_2_35_aarch64.whl	2026-03-31	33.2 MB	0
vllm-0.18.1+cpu-cp38-abi3-manylinux_2_35_x86_64.whl	2026-03-31	71.4 MB	0
vllm-0.18.1+cu130-cp38-abi3-manylinux_2_35_aarch64.whl	2026-03-31	214.0 MB	0
vllm-0.18.1+cu130-cp38-abi3-manylinux_2_35_x86_64.whl	2026-03-31	228.3 MB	0
vllm-0.18.1-cp38-abi3-manylinux_2_31_aarch64.whl	2026-03-31	385.6 MB	0
README.md	2026-03-30	453 Bytes	0
v0.18.1 source code.tar.gz	2026-03-30	30.7 MB	1
v0.18.1 source code.zip	2026-03-30	33.5 MB	10
Totals: 10 Items		1.5 GB	13

This is a patch release on top of v0.18.0 to address a few issues:

Change default SM100 MLA prefill backend back to TRT-LLM (#38562)
Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158)
Disable monolithic TRTLLM MoE for Renormalize routing [#37605]
Pre-download missing FlashInfer headers in Docker build [#38391]
Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083)

Source: README.md, updated 2026-03-30

Other Useful Business Software

AI-generated apps that pass security review Icon

AI-generated apps that pass security review

Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free

Full-stack observability with actually useful AI | Grafana Cloud Icon

Full-stack observability with actually useful AI | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account

Full-stack observability with actually useful AI | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account

Recommended Projects

tiny-llm
A course of learning LLM inference serving on Apple Silicon
FlashInfer
FlashInfer: Kernel Library for LLM Serving
OpenFold
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Xorbits Inference
Replace OpenAI GPT with another LLM in your app
SGLang
SGLang is a fast serving framework for large language models