Download Latest Version Release v0.4.8 source code.tar.gz (4.2 MB)
Email in envelope

Get an email when there's a new version of SGLang

Home / v0.4.6
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-04-27 43.0 kB
Release v0.4.6 source code.tar.gz 2025-04-27 3.7 MB
Release v0.4.6 source code.zip 2025-04-27 4.5 MB
Totals: 3 Items   8.3 MB 0

Highlights

  • Use FlashAttention3 as the default attention backend for main stream models (DeepSeek, QWen, Llama, etc). https://github.com/sgl-project/sglang/issues/4709#issuecomment-2817728855
  • PD disaggregation with mooncake and NIXL transfer backends [#4880] [#5477] [#4655]
  • DeepSeek performance improvements: turn on DeepGemm by default and some kernel fusions. [#5580] [#5628]
  • Update torch to 2.6.0. Fix torch.compile cache. [#5417] [#5213]
  • Preliminary support for blackwell [#5303]

Thanks very much to LinkedIn team, Alibaba Cloud, Mooncake team, NVIDIA Team, AMD Team, Pytorch Team, Ant Group, Baseten Team, Oracle Team, Meituan Team, iFlytek MaaS team and the open source community users for their contributions!

We’re thrilled about these advancements and eager to hear your feedback! Join us on our Slack channel at slack.sglang.ai to connect and share your thoughts. Cheers!

Coming Soon

  • Large scale expert parallelism + PD disaggregation [#4734] [#5524]
  • Pipeline Parallelism [#5724]
  • MLA Cutlass Backend [#5390]

What's Changed

New Contributors

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.4.5...v0.4.6

Source: README.md, updated 2025-04-27