Download Latest Version Release v0.4.8 source code.tar.gz (4.2 MB)
Email in envelope

Get an email when there's a new version of SGLang

Home / v0.4.8
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-06-24 24.6 kB
Release v0.4.8 source code.tar.gz 2025-06-24 4.2 MB
Release v0.4.8 source code.zip 2025-06-24 5.2 MB
Totals: 3 Items   9.4 MB 1

Highlights

OpenAI-Compatible Server Refactor

Re-structured the OpenAI-compatible server to support production and enterprise environments. Key improvements include:

  • Consistent metrics and logging for better observability and debugging.

  • Unified error handling, request validation, and processing logic for improved reliability and maintainability.

  • Improved request tracking across sessions and components.

  • Fixed bugs in embedding requests and reasoning parsers.

This work was a collaborative effort involving engineers from academic and industry institutions. Special thanks to the Oracle Cloud team and the SGLang team and community — including @slin1237, @CatherineSue, @key4ng, @JustinTong0323, @jhinpan, @yhyang201 and @whybeyoung — for their invaluable contributions.

DeepSeek R1 FP4 on Blackwell GPU

Added support for DeepSeek R1 with FP4 and MTP on NVIDIA Blackwell GPU.

  • Integrated FlashInfer NVFP4 MoE, supporting TP, EP, and DP.

  • Supported 2-stream shared expert execution.

  • Achieved up to 90 TPS per user at isl/osl/bs = 1k/1k/16 on B200.

Further optimization in progress. Special thanks to the FlashInfer, NVIDIA Enterprise Products, Novita AI, DataCrunch, Google Cloud, and SGLang teams — especially @Alcanderian and @pyc96 — for their critical contributions.

What's Changed

New Contributors

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.4.7...v0.4.8

Source: README.md, updated 2025-06-24