Download Latest Version v0.5.10.post1 source code.tar.gz (10.6 MB)
Email in envelope

Get an email when there's a new version of SGLang

Home / v0.5.8
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2026-01-23 120.3 kB
v0.5.8 source code.tar.gz 2026-01-23 8.9 MB
v0.5.8 source code.zip 2026-01-23 11.4 MB
Totals: 3 Items   20.4 MB 0

Highlights

New Model Support

  • Day 0 Support for GLM 4.7 Flash: [#17247]
  • LFM2 model support: [#16890]
  • Qwen3-VL-Embedding & Qwen3-VL-Reranker model support: [#16635], [#16403]
  • DeepSeek V3.2 NVFP4: https://huggingface.co/nvidia/DeepSeek-V3.2-NVFP4
  • [Diffusion] black-forest-labs/FLUX.2-klein-9B

DeepSeek V3.2 Optimization

  • Context Parallelism Optimization with support for fused MoE, multi-batch, and FP8 KV cache: [#13959]

Flash Attention 4

  • Support for Flash Attention 4 decoding kernels: [#16034]

SGLang-Diffusion

  • Run sglang-diffusion with diffusers backend
  • Features: Multi-LoRA inference, SLA attention backends, warmup switch in CLI, ComfyUI Plugin
  • Performance improvements for all models

Dependencies

  • sgl-kernel updated to 0.3.21: [#17075]
  • Cutedsl updated to 4.3.4: [#17075]
  • Added dependencies for tvm-ffi and quack-kernels: [#17075]
  • Flashinfer updated to 0.6.1: [#15551]
  • Mooncake transfer engine updated to 0.3.8.post1: [#16792]

Security

  • Fixed urllib and gpgv vulnerabilities: [#17439]

What's Changed

Source: README.md, updated 2026-01-23