SGLang - Browse /v0.4.7 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-06-10	72.0 kB	0
Release v0.4.7 source code.tar.gz	2025-06-10	4.1 MB	0
Release v0.4.7 source code.zip	2025-06-10	5.1 MB	0
Totals: 3 Items		9.2 MB	0

Highlights

The previously PD disaggregation and large-scale EP functionalities from the blog post have now been fully merged into the latest release.
The blog has been successfully reproduced by over six industry teams, including the TensorRT LLM team.
SGLang’s large-scale EP is now actively used by leading organizations such as Cursor, Qwen, Alimama, Alibaba Cloud, iFlytek, and more. It has been deployed and validated at large scale, running on GPU clusters with thousands of devices.
PD disaggregation and large-scale EP, in addition to supporting DeepSeek V3/R1, now also support Qwen 3 in the latest release.
Full Blackwell support for DeepSeek V3/R1, Llama 4, and Qwen 3. Further optimizations are underway.
SGLang's DeepSeek V3/R1 now achieves 190 TPS on single H200, outperforming other frameworks by over 50%.

We extend our sincere thanks to the following contributors, listed in alphabetical order: Alibaba Cloud, AMD Team, Ant Group, Baseten Team, Cursor Team, Dynamo Team, EAGLE Team, FlashInfer Team, Google Vertex AI Team, iFlytek MaaS Team, Intel Team, LinkedIn Team, Meituan Team, Microsoft Copilot Team, Mooncake Team, NVIDIA Team, Oracle Team, Qwen Team, Voltage Park Team and open source community users. Your support and collaboration are deeply appreciated!

What's Changed

Update nightly-test.yml by @merrymercy in https://github.com/sgl-project/sglang/pull/5797
[CI] Improve github summary & enable fa3 for more models by @merrymercy in https://github.com/sgl-project/sglang/pull/5796
[Docs] update grafana setup guide in production metrics by @PopSoda2002 in https://github.com/sgl-project/sglang/pull/5643
[Misc] add structure logging, write to file and log tracing for SGL R… by @slin1237 in https://github.com/sgl-project/sglang/pull/5741
Improve overlap scheduling by @hnyls2002 in https://github.com/sgl-project/sglang/pull/5788
Add Cutlass MLA attention backend by @trevor-m in https://github.com/sgl-project/sglang/pull/5390
chore: upgrade sgl-kernel 0.1.0 by @zhyncs in https://github.com/sgl-project/sglang/pull/5690
Dockerfile.dev pip scikit_build_core by @BBuf in https://github.com/sgl-project/sglang/pull/5807
Add a doc to fix sgl-kernel build link error in py39 with ccache by @BBuf in https://github.com/sgl-project/sglang/pull/5809
Turn on overlap scheduler for multimodal models by @merrymercy in https://github.com/sgl-project/sglang/pull/5771
Tiny refactor DefaultModelLoader.Source by @fzyzcjy in https://github.com/sgl-project/sglang/pull/5482
[Docs] Replace lists with tables for cleanup and readability in server_arguments by @windsonsea in https://github.com/sgl-project/sglang/pull/5276
Revert "Tiny refactor DefaultModelLoader.Source" by @merrymercy in https://github.com/sgl-project/sglang/pull/5825
Feat: add support for thinking mode via chat_template_kwargs.enable_t… by @minleminzui in https://github.com/sgl-project/sglang/pull/5551
fix: fix the error where the content is None when reasoning and tool … by @minleminzui in https://github.com/sgl-project/sglang/pull/5838
feat: Add fused moe triton config for qwen3 moe on h100 by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/5833
fused moe triton tuning script support qwen3 by @BBuf in https://github.com/sgl-project/sglang/pull/5842
feat: Add fused moe triton config for qwen3bf16 moe on h20 by @yhyang201 in https://github.com/sgl-project/sglang/pull/5839
[PD] support pd fake transfer for warmup by @whybeyoung in https://github.com/sgl-project/sglang/pull/5726
[qwen3] qwen3moe_tune_h20 fp8 tp4 by @whybeyoung in https://github.com/sgl-project/sglang/pull/5846
[Doc] Recover history of server_arguments.md by @Fridge003 in https://github.com/sgl-project/sglang/pull/5851
feat: Add fused moe triton config for qwen3-30b-fp8 moe on h20 by @GeLee-Q in https://github.com/sgl-project/sglang/pull/5850
[CI] test chunked prefill more by @merrymercy in https://github.com/sgl-project/sglang/pull/5798
ROCm: update AITER by @HaiShaw in https://github.com/sgl-project/sglang/pull/5816
[Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel by @yinfan98 in https://github.com/sgl-project/sglang/pull/5847
[Fix] Missing bootstrap_port field by @xutianyi1999 in https://github.com/sgl-project/sglang/pull/5823
feat: update is_fa3_default_architecture by @zhyncs in https://github.com/sgl-project/sglang/pull/5854
add fused moe config for qwen3moe fp8/bf16 by @yizhang2077 in https://github.com/sgl-project/sglang/pull/5849
chore: bump v0.4.6.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/5845
Support max_completion_tokens for OpenAIChatCompletions by @CatherineSue in https://github.com/sgl-project/sglang/pull/5857
simplify fused_moe config logging by @BBuf in https://github.com/sgl-project/sglang/pull/5801
[CI] tune the test order to warmup the server by @merrymercy in https://github.com/sgl-project/sglang/pull/5860
Cutlass MLA decode - fix dtype error by @trevor-m in https://github.com/sgl-project/sglang/pull/5868
cutlass 3.9 supported to improve fp8_blockwise_gemm by @BBuf in https://github.com/sgl-project/sglang/pull/5820
[Feature] support auto chat template by @woodx9 in https://github.com/sgl-project/sglang/pull/4949
Feat: support cuda graph for LoRA by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/4115
Add qwen3 30b fused moe config by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/5859
[Fix] Fix a bug for flashmla to run R1 model by @pengcuo in https://github.com/sgl-project/sglang/pull/5875
Add A800 fused moe config for qwen3 30b by @lambert0312 in https://github.com/sgl-project/sglang/pull/5880
[Misc] add service discovery for sgl router by @slin1237 in https://github.com/sgl-project/sglang/pull/5865
[fix]: PyO3 macOS linking and consolidate on tracing for logging by @slin1237 in https://github.com/sgl-project/sglang/pull/5856
chore: update Dockerfile by @zhyncs in https://github.com/sgl-project/sglang/pull/5894
[Docs] Update docs for Qwen3 and Qwen3MoE by @adarshxs in https://github.com/sgl-project/sglang/pull/5836
Tables instead of bulletpoints for sampling doc by @simveit in https://github.com/sgl-project/sglang/pull/5841
chore: update CODEOWNERS by @zhyncs in https://github.com/sgl-project/sglang/pull/5895
[FEATURE] Enhance platform compatibility for ARM by @johnnynunez in https://github.com/sgl-project/sglang/pull/5746
[CI] Add test_function_calling.py to run_suite.py by @CatherineSue in https://github.com/sgl-project/sglang/pull/5896
Auto set draft model path for MTP by @ispobock in https://github.com/sgl-project/sglang/pull/5793
[fix] relax mem_fraction_static for h200 by @Alcanderian in https://github.com/sgl-project/sglang/pull/5893
feat: support pythonic tool call and index in tool call streaming by @CatherineSue in https://github.com/sgl-project/sglang/pull/5725
[Bugfix]: fix missing queue_time_start for requests from grammar_queue by @CatherineSue in https://github.com/sgl-project/sglang/pull/5696
Add AMD MI300x Nightly Testing. by @saienduri in https://github.com/sgl-project/sglang/pull/5861
chore: use torch 2.6 for sgl-kernel build by @zhyncs in https://github.com/sgl-project/sglang/pull/5898
Fix check_env script by @lambert0312 in https://github.com/sgl-project/sglang/pull/5901
[PD] Fix Assertion failed: /DeepEP/csrc/kernels/internode.cu:483, condition: ibgda_get_state()->num_rc_per_pe >= num_channels [#134] by @whybeyoung in https://github.com/sgl-project/sglang/pull/5830
Bump Flashinfer to 0.2.5 by @Fridge003 in https://github.com/sgl-project/sglang/pull/5870
[Fix] Unload lora in HF_Runner if needed by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/5899
Add A800 fused moe config for qwen3 235b by @lambert0312 in https://github.com/sgl-project/sglang/pull/5900
Add sm_120 for blackwell by @zhjunqin in https://github.com/sgl-project/sglang/pull/5903
[Feature] add support kimi vl model by @liwenju0 in https://github.com/sgl-project/sglang/pull/5383
support vlm benchmark profile by @yizhang2077 in https://github.com/sgl-project/sglang/pull/5905
[fix] kimi-vl test in test_vision_openai_server.py by @Alcanderian in https://github.com/sgl-project/sglang/pull/5910
[Misc] use parallel build for cmake in sgl-kernel by @yinfan98 in https://github.com/sgl-project/sglang/pull/5919
[qwen3] support qwen3 ep moe by @laixinn in https://github.com/sgl-project/sglang/pull/5917
Add TP2 MOE benchmarks for AMD. by @saienduri in https://github.com/sgl-project/sglang/pull/5909
[Feat] Scale up fa3 kernel to sm8x arch by @yinfan98 in https://github.com/sgl-project/sglang/pull/5912
chore: bump sgl-kernel 0.1.1 by @zhyncs in https://github.com/sgl-project/sglang/pull/5932
chore: upgrade sgl-kernel 0.1.1 by @zhyncs in https://github.com/sgl-project/sglang/pull/5933
Remove unused method calculate_num_image_tokens from qwen2_vl.py by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/5783
[PP] Add pipeline parallelism by @Ying1123 in https://github.com/sgl-project/sglang/pull/5724
Fix lora batch processing when input lora_path contains None by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/5930
add Thor & Spark by @johnnynunez in https://github.com/sgl-project/sglang/pull/5915
fix: correct stream response when enable_thinking is set to false by @minleminzui in https://github.com/sgl-project/sglang/pull/5881
fix: update model runner by @zhyncs in https://github.com/sgl-project/sglang/pull/5934
chore: bump v0.4.6.post2 by @zhyncs in https://github.com/sgl-project/sglang/pull/5939
Support XiaomiMiMo/MiMo model inference by @ryang-max in https://github.com/sgl-project/sglang/pull/5921
[PD] Vectorise group_concurrent_contiguous in NumPy by @yuan-luo in https://github.com/sgl-project/sglang/pull/5834
Remove extra contiguous by @ispobock in https://github.com/sgl-project/sglang/pull/5953
Update ci test and doc for MTP api change by @ispobock in https://github.com/sgl-project/sglang/pull/5952
docs: Fix Qwen model typo by @JiangJiaWei1103 in https://github.com/sgl-project/sglang/pull/5944
Optimize a pad operation to accelerate 25us by @hebiao064 in https://github.com/sgl-project/sglang/pull/5945
Properly return error response in vertex_generate HTTP endpoint by @KCFindstr in https://github.com/sgl-project/sglang/pull/5956
feat: add concurrency evaluation logic in mmmu benchmark by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/5782
Add 1 gpu perf and 2 gpu accuracy tests for AMD MI300x CI. by @saienduri in https://github.com/sgl-project/sglang/pull/5960
feat: Refactor DeepSeekV3 function call by @CatherineSue in https://github.com/sgl-project/sglang/pull/5908
Remove token in token out in Native API by @zhaochenyang20 in https://github.com/sgl-project/sglang/pull/5967
Support InternVL3 by @xiaomin-D in https://github.com/sgl-project/sglang/pull/5350
Support MMMU benchmark for InternVL by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/5968
FA3 speed up: skip len operation and get batch size directly from forward batch by @lifuhuang in https://github.com/sgl-project/sglang/pull/5969
[PD] NIXL backend Prefill TP & Decode TP+DP by @jokerwyt in https://github.com/sgl-project/sglang/pull/5681
Fix set kv cache multi-stream by @ispobock in https://github.com/sgl-project/sglang/pull/5975
Overlap qk norm with two streams by @ispobock in https://github.com/sgl-project/sglang/pull/5977
fix: only upgrade nccl for cu128 by @zhyncs in https://github.com/sgl-project/sglang/pull/5986
Fix Phi3 serving which was broke by earlier change by @hebiao064 in https://github.com/sgl-project/sglang/pull/5991
[perf] H100 DeepSeek-V3 fused moe tuned config by @Alcanderian in https://github.com/sgl-project/sglang/pull/5998
[Fix] Suppress dynamo logging when using flashinfer backend with torch compile by @Fridge003 in https://github.com/sgl-project/sglang/pull/5992
[Minor] Fix duplicate method definitions in conversation.py by @lifuhuang in https://github.com/sgl-project/sglang/pull/6012
Fix flaky issues of lora and add multi batch tests by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/5957
Add chat_template_kwargs documentation by @vincentzed in https://github.com/sgl-project/sglang/pull/5679
fix: fix broadcast_pyobj breaking VerlEngine by @ocss884 in https://github.com/sgl-project/sglang/pull/5997
[PD] Allow customizing reserved tokens to avoid KV cache waste by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6002
Update dev container config to support live code sync and improve docker setup guide by @lifuhuang in https://github.com/sgl-project/sglang/pull/6018
[PD] Optimize disaggregation ib device help info by @ShangmingCai in https://github.com/sgl-project/sglang/pull/5781
[Test] Add flashmla attention backend test by @PopSoda2002 in https://github.com/sgl-project/sglang/pull/5587
Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" by @Edenzzzz in https://github.com/sgl-project/sglang/pull/5555
feat: Add a unified merge_state API by @DefTruth in https://github.com/sgl-project/sglang/pull/5428
feat: append more comprehensive fields in messages instead of merely role and content by @minleminzui in https://github.com/sgl-project/sglang/pull/5996
[Security][Bug] Prevent binding to all TCP interfaces by @adarshxs in https://github.com/sgl-project/sglang/pull/5752
Fix prefill OOM error in the case of large page size by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/5081
Fix problem of large page size with chunked prefill by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/6046
docs: add Google Cloud Vertex AI in Adoption and Sponsorship by @zhyncs in https://github.com/sgl-project/sglang/pull/6047
docs: add new blog by @zhyncs in https://github.com/sgl-project/sglang/pull/6048
Fix not "import os" by @hnyls2002 in https://github.com/sgl-project/sglang/pull/6057
Better PD initialization by @hnyls2002 in https://github.com/sgl-project/sglang/pull/5751
fix: deepep dockerfile, use pip install deepep. by @HanHan009527 in https://github.com/sgl-project/sglang/pull/5885
[Fix] Fix and rename flashmla CI test by @Fridge003 in https://github.com/sgl-project/sglang/pull/6045
chore: upgrade cutlass 3.9.2 by @zhyncs in https://github.com/sgl-project/sglang/pull/6004
Fix sgl-kernel build on aarch64 platforms by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/6062
Add DeepEP to CI PR Test by @liz-badada in https://github.com/sgl-project/sglang/pull/5655
fix custom_allreduce namespace by @BBuf in https://github.com/sgl-project/sglang/pull/6039
feat: add release workflow for SGLang kernels on aarch64 by @johnnynunez in https://github.com/sgl-project/sglang/pull/6010
[Feature] Support for Ascend NPU backend by @botieking98 in https://github.com/sgl-project/sglang/pull/3853
Fix the timeout for 8 gpu tests by @merrymercy in https://github.com/sgl-project/sglang/pull/6084
Hint users DeepEP normal mode is incompatible with CUDA Graph by @fzyzcjy in https://github.com/sgl-project/sglang/pull/5014
Super tiny fix doc by @fzyzcjy in https://github.com/sgl-project/sglang/pull/5233
[Doc]Fix description for dp_size argument by @Fridge003 in https://github.com/sgl-project/sglang/pull/6063
feat(engine): add bootstrap parameters to generate methods (dynamo) by @ishandhanani in https://github.com/sgl-project/sglang/pull/6075
[refactor] slightly tidy fp8 module by @Alcanderian in https://github.com/sgl-project/sglang/pull/5993
Clean up fa3 test from 8 gpus by @hebiao064 in https://github.com/sgl-project/sglang/pull/6105
Deferring 8 GPU test by @ch-wan in https://github.com/sgl-project/sglang/pull/6102
Update doc for MLA attention backends by @Fridge003 in https://github.com/sgl-project/sglang/pull/6034
Clean logs for DeepSeek-V3 launching by @Fridge003 in https://github.com/sgl-project/sglang/pull/6079
[CI]Add performance CI for VLM by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/6038
adding Triton configs for DeepSeekV3 FusedMoE kernel on Blackwell by @Fridge003 in https://github.com/sgl-project/sglang/pull/6111
optimize pad operations in fa3 to accelarate 100+us by @zminglei in https://github.com/sgl-project/sglang/pull/6077
Overlap shared expert and routed expert computations by @fzyzcjy in https://github.com/sgl-project/sglang/pull/5121
Tiny refactor ModelConfig.from_server_args by @fzyzcjy in https://github.com/sgl-project/sglang/pull/5219
Tiny refactor weight loading logic by @fzyzcjy in https://github.com/sgl-project/sglang/pull/5232
[PD] Add control to slow down a server by @fzyzcjy in https://github.com/sgl-project/sglang/pull/5572
Change AMD test threshold by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6091
DeepEP normal support deepgemm-contiguous by @sleepcoo in https://github.com/sgl-project/sglang/pull/5626
[fix] fix pyproject.toml dependencies by @Alcanderian in https://github.com/sgl-project/sglang/pull/6119
[Feature] Add FlashAttention3 as a backend for VisionAttention by @Othame in https://github.com/sgl-project/sglang/pull/5764
[perf] dsv3 bmm fallback to bf16 by @Alcanderian in https://github.com/sgl-project/sglang/pull/5662
[AMD] switch to custom allreduce regardless of MSCCL setting on ROCm by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/6097
[sgl-kernel] fix: fix cu118 compile error by @yinfan98 in https://github.com/sgl-project/sglang/pull/6123
upgrade xgrammar to 0.1.19 by @Ubospica in https://github.com/sgl-project/sglang/pull/6129
Remove unecessary is_fa3_supported check by @hebiao064 in https://github.com/sgl-project/sglang/pull/6112
chore: bump sgl-kernel 0.1.2 by @zhyncs in https://github.com/sgl-project/sglang/pull/6131
docs: update README by @zhyncs in https://github.com/sgl-project/sglang/pull/6132
[Fix] Incorrect Memory Allocation on CUDA:0 by Non-Zero CUDA Processes in TP/DP by @yhyang201 in https://github.com/sgl-project/sglang/pull/5745
Cutlass MLA: Disable split kv due to https://github.com/NVIDIA/cutlass/issues/2274 by @trevor-m in https://github.com/sgl-project/sglang/pull/6101
opt flashinfer mla cat by @xu-yfei in https://github.com/sgl-project/sglang/pull/5822
Update amd nightly concurrency. by @saienduri in https://github.com/sgl-project/sglang/pull/6141
sampling_params: add thinking_budget by @thyecust in https://github.com/sgl-project/sglang/pull/6089
[Bugfix] Fix Llama4 gibberish output with long context and CUDA graph by @CatherineSue in https://github.com/sgl-project/sglang/pull/6162
fix bug that gpu0 occupies more memory when hicache is turned on by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/5778
chore: bump v0.4.6.post3 by @zhyncs in https://github.com/sgl-project/sglang/pull/6165
KV‑Cache (MHA, MLA): add missing start_layer / end_layer fields to MHATokenToKVPoolHost and MLATokenToKVPoolHost by @Simon-Li in https://github.com/sgl-project/sglang/pull/6016
[fix] fix determine_n_share_experts_fusion by @Alcanderian in https://github.com/sgl-project/sglang/pull/6118
Fix and Clean up chat-template requirement for VLM by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/6114
[Docs]Delete duplicate content by @Ximingwang-09 in https://github.com/sgl-project/sglang/pull/6146
Revert "feat: add thinking_budget (#6089)" by @zhyncs in https://github.com/sgl-project/sglang/pull/6181
Added async_encode method to Engine by @shimizust in https://github.com/sgl-project/sglang/pull/4701
Fix data parallel perf regression by @merrymercy in https://github.com/sgl-project/sglang/pull/6183
Fix request abortion by @merrymercy in https://github.com/sgl-project/sglang/pull/6184
Add typo checker in pre-commit by @applesaucethebun in https://github.com/sgl-project/sglang/pull/6179
Remove duplicate IO Struct test by @emmanuel-ferdman in https://github.com/sgl-project/sglang/pull/6180
[PD] Add simple unit test for disaggregation feature by @ShangmingCai in https://github.com/sgl-project/sglang/pull/5654
[CI] Disabled deepep tests temporarily because it takes too much time. by @merrymercy in https://github.com/sgl-project/sglang/pull/6186
feat: support loogle eval by @zhyncs in https://github.com/sgl-project/sglang/pull/6190
[fix] remove mixtral from is_fa3_default_architecture by @Alcanderian in https://github.com/sgl-project/sglang/pull/6191
fix: handle None multimodal_inputs during merging and filtering batches in disaggregation decode mode by @GaoYusong in https://github.com/sgl-project/sglang/pull/6169
chore: upgrade deepgemm by @zhyncs in https://github.com/sgl-project/sglang/pull/6073
chore: bump sgl-kernel v0.1.2.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/6195
chore: upgrade sgl-kernel v0.1.2.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/6196
Handle empty input string for embedding models by @ravi03071991 in https://github.com/sgl-project/sglang/pull/5621
doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct by @minleminzui in https://github.com/sgl-project/sglang/pull/6199
[Docs] minor Qwen3 and reasoning parser docs fix by @adarshxs in https://github.com/sgl-project/sglang/pull/6032
Improve structured outputs: fix race condition, server crash, metrics and style by @merrymercy in https://github.com/sgl-project/sglang/pull/6188
[CI] Reorganize the 8 gpu tests by @merrymercy in https://github.com/sgl-project/sglang/pull/6192
Add dev-deepep docker image by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6198
Replace time.time() to time.perf_counter() for benchmarking. by @lifuhuang in https://github.com/sgl-project/sglang/pull/6178
Update README.md by @merrymercy in https://github.com/sgl-project/sglang/pull/6202
Fix release-docs.yml to not use python 3.9 by @merrymercy in https://github.com/sgl-project/sglang/pull/6204
Fix start_profile does not support with_stack and record_shapes by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6043
[doc] add a note for --n-share-experts-fusion args by @BBuf in https://github.com/sgl-project/sglang/pull/6154
Performing Vocabulary Parallelism for LM Head across Attention TP Groups by @ch-wan in https://github.com/sgl-project/sglang/pull/5558
Update AMD CI docker to v0.4.6.post3-rocm630. by @saienduri in https://github.com/sgl-project/sglang/pull/6213
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs by @merrymercy in https://github.com/sgl-project/sglang/pull/6201
[CI] Fix PD mooncake dependency error by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6212
[CI] Re-enable pd disaggregation test by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6231
fix some typos by @applesaucethebun in https://github.com/sgl-project/sglang/pull/6209
[Docs] Add docs for SGLANG_ and SGL_ environment variables by @b8zhong in https://github.com/sgl-project/sglang/pull/6206
[PP] Fix init_memory_pool desync & add PP for mixtral by @Ying1123 in https://github.com/sgl-project/sglang/pull/6223
Revert "fix some typos" by @merrymercy in https://github.com/sgl-project/sglang/pull/6244
chore: add hf_xet dep by @zhyncs in https://github.com/sgl-project/sglang/pull/6243
Update AMD nightly deps. by @saienduri in https://github.com/sgl-project/sglang/pull/6241
[PD] Add support for different TP sizes per DP rank by @ShangmingCai in https://github.com/sgl-project/sglang/pull/5922
Support incremental streaming of logprob/token_ids between scheduler and detokenizer by @merrymercy in https://github.com/sgl-project/sglang/pull/6225
fix typo by @zhyncs in https://github.com/sgl-project/sglang/pull/6248
Support tuning moe for llama 4 model by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6042
Skip the flaky test_stateful_custom_logit_processor by @merrymercy in https://github.com/sgl-project/sglang/pull/6251
[Llama4] Add docs note about enable multimodal by @b8zhong in https://github.com/sgl-project/sglang/pull/6235
[VERL Use Case] Add torch_memory_saver into deps by @hebiao064 in https://github.com/sgl-project/sglang/pull/6247
Fix two issues related to --moe-dense-tp-size=1 by @ch-wan in https://github.com/sgl-project/sglang/pull/5657
model(vlm): pixtral by @KivenChen in https://github.com/sgl-project/sglang/pull/5084
[misc] deep_gemm fallback to NVRTC when NVCC not found by @Alcanderian in https://github.com/sgl-project/sglang/pull/6252
Enable MI325X AMD CI. by @saienduri in https://github.com/sgl-project/sglang/pull/6259
chore: bump v0.4.6.post4 by @zhyncs in https://github.com/sgl-project/sglang/pull/6245
[CPU] Add CMakeLists.txt for sgl-kernel by @blzheng in https://github.com/sgl-project/sglang/pull/6115
perf: optimize local_block_table memory allocation by @CatherineSue in https://github.com/sgl-project/sglang/pull/6273
Fix a bug in schedule_policy by @Ying1123 in https://github.com/sgl-project/sglang/pull/6276
[Bug] Fix accidental logger override caused by internVL. by @lifuhuang in https://github.com/sgl-project/sglang/pull/6282
doc: update developer guide regarding mllms by @mickqian in https://github.com/sgl-project/sglang/pull/6138
docs: fix a bad redirect by @b8zhong in https://github.com/sgl-project/sglang/pull/6300
Enable unit tests for AMD CI. by @saienduri in https://github.com/sgl-project/sglang/pull/6283
[AMD] Fix Llama 4 Scout and Maverick accuracy issues on MI300X by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/6274
feat: add flush cache to EngineBase and HttpServerEngineAdapter by @ocss884 in https://github.com/sgl-project/sglang/pull/6009
[fix][RL] Remove the incorrect barrier in init_weights_update_group by @zhuzilin in https://github.com/sgl-project/sglang/pull/5914
[Feat] Support FlashMLA backend with MTP and FP8 KV cache by @quinnrong94 in https://github.com/sgl-project/sglang/pull/6109
[misc] remove redundant platform codes by @Alcanderian in https://github.com/sgl-project/sglang/pull/6298
Add fp8 gemm kernel for CPU in sgl-kernel and add gemm UT by @chunyuan-w in https://github.com/sgl-project/sglang/pull/6216
Fix gpu mem check on CPU by @yiliu30 in https://github.com/sgl-project/sglang/pull/6317
Reduce MoE memory usage by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6147
Fix lora bench by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/6302
Minor improvements of TokenizerManager / health check by @merrymercy in https://github.com/sgl-project/sglang/pull/6327
Upgrade CUTLASS 4.0 by @elfiegg in https://github.com/sgl-project/sglang/pull/6336
Support precomputed multimodal features for Qwen-VL and Gemma3 models. by @ysulsky in https://github.com/sgl-project/sglang/pull/6136
[Fix] Improve dependencies for Blackwell image by @Fridge003 in https://github.com/sgl-project/sglang/pull/6334
[2/2] Add python wrapper for CUTLASS FP8 Blockscale MoE Kernel. by @elfiegg in https://github.com/sgl-project/sglang/pull/5694
feat: add dp attention support for Qwen 2/3 MoE models, fixes [#6088] by @Fr4nk1inCs in https://github.com/sgl-project/sglang/pull/6121
Update CODEOWNERS by @merrymercy in https://github.com/sgl-project/sglang/pull/6359
[Minor] cleanup unused imports by @merrymercy in https://github.com/sgl-project/sglang/pull/6358
Fix amd ci by @merrymercy in https://github.com/sgl-project/sglang/pull/6360
docs: update readme by @zhyncs in https://github.com/sgl-project/sglang/pull/6361
model(vlm): mistral 3.1 by @KivenChen in https://github.com/sgl-project/sglang/pull/5099
Fix one wasted kernel in DeepSeek and minor refactor by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6316
Minor code cleanup refactor for DeepSeek models by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6324
chore: bump sgl-kernel v0.1.3 by @zhyncs in https://github.com/sgl-project/sglang/pull/6368
perf: Optimize local attention memory allocation in FlashAttentionBackend by @CatherineSue in https://github.com/sgl-project/sglang/pull/6356
docs: Update the MD files by @vincentzed in https://github.com/sgl-project/sglang/pull/6373
[router] Add /list_workers endpoint to router by @zhuzilin in https://github.com/sgl-project/sglang/pull/6366
Speed up when having padding tokens in DeepEP by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6175
Use monotonic clock for interval measurement by @lifuhuang in https://github.com/sgl-project/sglang/pull/6211
[fix] illegal memory in _fwd_kernel_ep_scatter_2 and _fwd_kernel_ep_gather by @xutizhou in https://github.com/sgl-project/sglang/pull/6348
Fix stop_profile does not wait for finishing by @fzyzcjy in https://github.com/sgl-project/sglang/pull/4741
Support outputing details for bench_serving by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6107
Tiny refactor bench_serving to improve extensibility by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6134
Tiny refactor bench_serving to extract RequestFuncOutput.init_new by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6108
Support custom DeepEP tuning config by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6257
Fix expert distribution recorder and profiler command stuck forever by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6284
Reland tiny refactor DefaultModelLoader.Source by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6041
Add expert distribution APIs for engine by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6290
fix: allow launch_dummy_health_check_server to start inside of running asyncio loop by @ishandhanani in https://github.com/sgl-project/sglang/pull/6330
[Fix Chat API] add request id for chat/completion for tracing by @whybeyoung in https://github.com/sgl-project/sglang/pull/6364
Fix CI tests by @merrymercy in https://github.com/sgl-project/sglang/pull/6362
chore: upgrade sgl-kernel v0.1.3 by @zhyncs in https://github.com/sgl-project/sglang/pull/6377
Do not use FA3 for mistral by @merrymercy in https://github.com/sgl-project/sglang/pull/6379
refactor: minor refactors regarding multimodal processing by @mickqian in https://github.com/sgl-project/sglang/pull/6187
Add pipeline parallelism for Qwen2 and Qwen3 Model by @libratiger in https://github.com/sgl-project/sglang/pull/6250
Clean up AMD CI. by @saienduri in https://github.com/sgl-project/sglang/pull/6365
feat: add long context example by @zhyncs in https://github.com/sgl-project/sglang/pull/6391
The Gemma template is missing a newline after the user role. by @ysulsky in https://github.com/sgl-project/sglang/pull/6331
chore: tiny remove duplicated code by @doujiang24 in https://github.com/sgl-project/sglang/pull/6392
Add 4-GPU runner tests and split existing tests by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6383
Add fp8 shared_expert kernel for CPU in sgl-kernel and add UT by @chunyuan-w in https://github.com/sgl-project/sglang/pull/6339
[fix] fix fa3 forward_decode with spec_decode by @Alcanderian in https://github.com/sgl-project/sglang/pull/6395
Add missing model to doc by @applesaucethebun in https://github.com/sgl-project/sglang/pull/6396
[OAI] Add rid tracing for v1/embeddings and fix rid type in Chat by @CatherineSue in https://github.com/sgl-project/sglang/pull/6397
[Misc] Implement RankZeroFilter for rank-specific logging in model_runner.py by @CatherineSue in https://github.com/sgl-project/sglang/pull/6333
refactor: Extract repeated member variables in KVCache subclasses to base class. by @wangxiyu191 in https://github.com/sgl-project/sglang/pull/6323
Refactor DeepSeek MoE layer to unify the two forward branches by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6325
vlm: tensor hash kernel by @mickqian in https://github.com/sgl-project/sglang/pull/5974
[Bugfix] Fix field error in v1_embedding_request by @CatherineSue in https://github.com/sgl-project/sglang/pull/6400
Fix request id error by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6401
Implement return_hidden_states for the OpenAI API by @kyle-pena-kuzco in https://github.com/sgl-project/sglang/pull/6137
Fix nodeepgemm init by @sleepcoo in https://github.com/sgl-project/sglang/pull/6417
Improve supported models doc by @simveit in https://github.com/sgl-project/sglang/pull/6430
Fix throughput threshold for amd ci test by @Fridge003 in https://github.com/sgl-project/sglang/pull/6414
[Metrics] Add KV events publishing by @trevor-m in https://github.com/sgl-project/sglang/pull/6098
[BUG] fix stop_profile crash by @yizhang2077 in https://github.com/sgl-project/sglang/pull/6431
Revert "Implement return_hidden_states for the OpenAI API (#6137)" by @zhyncs in https://github.com/sgl-project/sglang/pull/6440
Expert distribution recording without overhead for EPLB by @fzyzcjy in https://github.com/sgl-project/sglang/pull/4957
Refactor communication logic of DeepSeek for extensibility and understandability by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6321
Remove Cargo.lock, add it into .gitignore by @hnyls2002 in https://github.com/sgl-project/sglang/pull/6438
Refactor DeepSeek logic into atomic operations by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6326
Support loading weights when physical experts are different from logical experts by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6386
Support DeepSeek EPLB algorithm with static distributions by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6387
Address performance regression: disable multiple streams on ROCm by @HaiShaw in https://github.com/sgl-project/sglang/pull/6412
[QuickFix] fix gptq model initialize by @yinfan98 in https://github.com/sgl-project/sglang/pull/6429
Update extend/decode attention kernel for CPU in sgl-kernel and add UTs by @yanbing-j in https://github.com/sgl-project/sglang/pull/6405
[doc] add note for get_num_kv_splits in triton_backend by @Alcanderian in https://github.com/sgl-project/sglang/pull/6444
Support dispatching logical to physical experts by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6385
Fix master CI for DeepSeek by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6447
[docs] Fix torch version by @Edenzzzz in https://github.com/sgl-project/sglang/pull/6472
Disable all two stream overlap on amd by @merrymercy in https://github.com/sgl-project/sglang/pull/6475
Refactor group_concurrent_contiguous in NIXL by @yuan-luo in https://github.com/sgl-project/sglang/pull/6214
aiter attention-backend (default enabled on AMD/ROCm) by @HaiShaw in https://github.com/sgl-project/sglang/pull/6381
Implement Siglip Vision model, and support BNB quantization for gemma3-mm by @guapisolo in https://github.com/sgl-project/sglang/pull/5339
[router] support http2 in router by @zhuzilin in https://github.com/sgl-project/sglang/pull/6487
[RL] allow weight updation with dp attention enabled by @zhuzilin in https://github.com/sgl-project/sglang/pull/6311
Refactor DeepSeek attention dispatching by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6476
Fix num_qps_per_rank computation when providing custom DeepEP configuration by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6468
Tiny add stage assertions to DeepEPDispatcher to avoid misuse by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6467
Support redundant experts in expert parallel by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6461
Tiny make Lint CI show diff by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6445
Let bench_one_batch_server use sharegpt data to make expert distribution more natural by @fzyzcjy in https://github.com/sgl-project/sglang/pull/5573
Fix bench_one_batch_server by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6503
[Fix]Fix capture fail bug for DeepSeek by @Fridge003 in https://github.com/sgl-project/sglang/pull/6275
[CPU] Fix build issue by @blzheng in https://github.com/sgl-project/sglang/pull/6419
fix: EXAONE when using tie_word_embeddings by @lkm2835 in https://github.com/sgl-project/sglang/pull/5759
doc: Update README.md with adding deepwiki badge to enable weekly auto-refresh by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/6508
Recover from corrupted cache file in bench serving by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6510
Apply constraint grammar to EAGLE by @ispobock in https://github.com/sgl-project/sglang/pull/6499
[1/2] Support Qserve by @HandH1998 in https://github.com/sgl-project/sglang/pull/6457
[PD] Add doc and simplify sender.send by @ByronHsu in https://github.com/sgl-project/sglang/pull/6019
[PD] Abort request if transfer fails by @ByronHsu in https://github.com/sgl-project/sglang/pull/6504
Add main for merge state tests by @yuan-luo in https://github.com/sgl-project/sglang/pull/6492
Support updating expert locations dynamically by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6388
[RL] Remove the w13 weight_scale and input_scale for UnquantizedEPMoE… by @zhuzilin in https://github.com/sgl-project/sglang/pull/6308
Support logging expert balancedness metrics by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6482
Support dynamically rebalancing experts using EPLB by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6469
Fix missing http status import for PD failure handler by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6520
chore: bump sgl-kernel v0.1.4 by @zhyncs in https://github.com/sgl-project/sglang/pull/6522
Support qwen3 deepep by @sleepcoo in https://github.com/sgl-project/sglang/pull/6120
chore: upgrade sgl-kernel v0.1.4 by @zhyncs in https://github.com/sgl-project/sglang/pull/6532
Support XiaomiMiMo inference with mtp by @ryang-max in https://github.com/sgl-project/sglang/pull/6059
misc: fix accept_length by @zhyncs in https://github.com/sgl-project/sglang/pull/6536
[PD] Fix failure abort by @ByronHsu in https://github.com/sgl-project/sglang/pull/6535
[VLM] Support chunk prefill for VLM by @CatherineSue in https://github.com/sgl-project/sglang/pull/6355
Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT by @blzheng in https://github.com/sgl-project/sglang/pull/6493
Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT by @chunyuan-w in https://github.com/sgl-project/sglang/pull/6404
Update sgl-kernel UTs for activation/topk/norm/rope kernels by @yanbing-j in https://github.com/sgl-project/sglang/pull/6452
Fix topk inference performance reduce by @lambert0312 in https://github.com/sgl-project/sglang/pull/6474
[PD] support spec decode by @ByronHsu in https://github.com/sgl-project/sglang/pull/6507
[2/2] Support Qserve by @HandH1998 in https://github.com/sgl-project/sglang/pull/6521
[PD] Support logprob & Add failure test by @ByronHsu in https://github.com/sgl-project/sglang/pull/6558
fix: remove content=none test when tool called by @shuaills in https://github.com/sgl-project/sglang/pull/6347
Update cmdline --enable-dp-attention help string for Qwen 2/3 Moe models. by @MiterV1 in https://github.com/sgl-project/sglang/pull/6524
Bugfix: handle flatten_batch constraint for multiple images by @CatherineSue in https://github.com/sgl-project/sglang/pull/6562
support eplb for qwen3 by @yizhang2077 in https://github.com/sgl-project/sglang/pull/6533
feat(Tool Calling): Support required and specific function mode by @CatherineSue in https://github.com/sgl-project/sglang/pull/6550
[PD] Support structured output by @ByronHsu in https://github.com/sgl-project/sglang/pull/6560
[FIX]remove ServerArgs duplicate code by @pc-neo in https://github.com/sgl-project/sglang/pull/6485
Fix accuracy is zero when enabling moe-dense-tp-size as in large scale EP by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6567
chore: bump v0.4.6.post5 by @zhyncs in https://github.com/sgl-project/sglang/pull/6566
Temporarily disable MI325x 8 gpu testing. by @saienduri in https://github.com/sgl-project/sglang/pull/6576
Fix GPU OOM by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/6564
Refactor attention into multiple stages by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6477
Add back DeepSeek non-TBO branches by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6578
Utilize static dispatching for communicator by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6577
Support overlapping two batches by @fzyzcjy in https://github.com/sgl-project/sglang/pull/4068
Refactor vlm embedding routine to use precomputed feature by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/6543
[OAI] Support non-normalized logprobs in OpenAI server by @CatherineSue in https://github.com/sgl-project/sglang/pull/5961
Support Phi-4 Multi-Modal (text + vision only) by @lifuhuang in https://github.com/sgl-project/sglang/pull/6494
Sgl-router Prometheus metrics endpoint and usage track metrics by @upfixer in https://github.com/sgl-project/sglang/pull/6537
added support for tied weights in qwen pipeline parallelism by @FrankLeeeee in https://github.com/sgl-project/sglang/pull/6546
Hint users when weight update timeouts by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6570
Fix some issues with current docs. by @simveit in https://github.com/sgl-project/sglang/pull/6588
[PD] Fix prefill_servers in mini_lb by @wangxiyu191 in https://github.com/sgl-project/sglang/pull/6527
Fix bench_serving does not support changing warmup requests by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6439
Support fake perfectly balanced EP dispatch algorithm by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6571
Fix profiling will crash the server when using num_steps by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6586
Improve performance of two batch overlap in some imbalanced cases by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6593
Logging and minor fixes to two batch overlap and EPLB by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6595
Tiny change killall_sglang.sh by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6596
Auto handle PD disaggregation in bench_serving by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6587
Support accurate length control for bench serving by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6594
Tiny fix lint CI does not trigger on master by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6609
chore: upgrade transformers 4.52.3 by @zhyncs in https://github.com/sgl-project/sglang/pull/6575
Revert "Tiny fix lint CI does not trigger on master (#6609)" by @zhyncs in https://github.com/sgl-project/sglang/pull/6610
refactor qwen moe code, use communicator to support tp+dp by @yizhang2077 in https://github.com/sgl-project/sglang/pull/6581
feat: Improve Mistral and Qwen25 function call parsing by @CatherineSue in https://github.com/sgl-project/sglang/pull/6597
qwen3moe support two batch overlap by @yizhang2077 in https://github.com/sgl-project/sglang/pull/6598
Tiny fix CI by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6611
Supported precomputed feature for Kimi VL by @lifuhuang in https://github.com/sgl-project/sglang/pull/6599
[FA][Test] Fix Sparse FA test by @b8zhong in https://github.com/sgl-project/sglang/pull/6306
fix qwen3moe eplb prefill bug by @yizhang2077 in https://github.com/sgl-project/sglang/pull/6617
Automatically configure for EPLB-related args by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6628
Fix EPLB algorithm fail to run when using 3 nodes for prefill by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6629
Tiny fix missing expert location dispatch info by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6620
Update nightly thresholds and dependencies. by @saienduri in https://github.com/sgl-project/sglang/pull/6635
Tiny fix sampler error when prob is not contiguous by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6639
[PD] Handle P/D failure and reconnect without affecting other instances by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6263
follow-up: move Idefics2 to a shared location to eliminate unexpected dependency. by @lifuhuang in https://github.com/sgl-project/sglang/pull/6603
fix: added "\n" to qwen25 tool parser structural tags by @shuaills in https://github.com/sgl-project/sglang/pull/6631
[New Model] Devstral support by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/6547
chore: upgrade mooncake-transfer-engine by @zhyncs in https://github.com/sgl-project/sglang/pull/6643
Tiny refactor communicator by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6646
Support TP in attention for two batch overlap by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6634
Super tiny rename environment variable by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6648
Refactor LoRA handling to support adapter tensors in fused format by @lifuhuang in https://github.com/sgl-project/sglang/pull/6585
[Bugfix]: Fix call for function_call_parser.multi_format_detector in adapter.py by @CatherineSue in https://github.com/sgl-project/sglang/pull/6650
update toc for doc and dockerfile code style format by @habaohaba in https://github.com/sgl-project/sglang/pull/6450
Add note to add supported model to documentation by @b8zhong in https://github.com/sgl-project/sglang/pull/6640
docs: Update documentation to reflect xgrammar as default grammar backend by @vincentzed in https://github.com/sgl-project/sglang/pull/6601
Add environment flag for disabling message queue broadcaster by @Fridge003 in https://github.com/sgl-project/sglang/pull/6403
fix: fix nightly test from updating transformers by @mickqian in https://github.com/sgl-project/sglang/pull/6658
Fix qwen3 tbo/dp-lm-head by @yizhang2077 in https://github.com/sgl-project/sglang/pull/6652
fix communicator for non-dp lm head by @ch-wan in https://github.com/sgl-project/sglang/pull/6662
Support EAGLE draft extend CUDA graph by @ispobock in https://github.com/sgl-project/sglang/pull/6606
DeepSeek: enable none block-quant FP8 quantizations by @HaiShaw in https://github.com/sgl-project/sglang/pull/6638
Fix OOM when updating expert locations by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6660
Speed up expert location update by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6661
Revert "fix communicator for non-dp lm head (#6662)" by @zhyncs in https://github.com/sgl-project/sglang/pull/6677
[PD] Make bootstrap code common between NIXL and Mooncake by @trevor-m in https://github.com/sgl-project/sglang/pull/6473
[CI] update verlengine ci to 4-gpu test by @ocss884 in https://github.com/sgl-project/sglang/pull/6007
Fix DeepEP error in Qwen 3 MoE models by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6673
Support gathering expert distribution details by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6665
Disable compiling arch below sm_90 in aarch64 by default by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/6380
fix(tool call): Fix tool_index in PythonicDetector and issues with mixed output in non-streaming by @CatherineSue in https://github.com/sgl-project/sglang/pull/6678
Add batch test for draft extend by @ispobock in https://github.com/sgl-project/sglang/pull/6672
feat: Add warnings for invalid tool_choice and UTs by @CatherineSue in https://github.com/sgl-project/sglang/pull/6582
Update amd docker and nightly models. by @saienduri in https://github.com/sgl-project/sglang/pull/6687
Refine pre_reorder_triton_kernel slightly to improve performance by @yuan-luo in https://github.com/sgl-project/sglang/pull/6627
fix log_info_on_rank0 error when run benchmark by @BBuf in https://github.com/sgl-project/sglang/pull/6260
fix(deepseekv3): Fix DeepSeekV3Detector tool_index assignment and multi-tool call streaming support by @CatherineSue in https://github.com/sgl-project/sglang/pull/6655
[Bugfix] Fix missing abort finish reason for PD with ChatCompletion by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6693
[CI] Fix flaky pp single node test by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6689
[PD] Abort unbootstrapped prefill requests through timeout by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6685
[PD Perf] replace Queue to FastQueue by @whybeyoung in https://github.com/sgl-project/sglang/pull/6649
[Bugfix] Fix slice operation when chunk size mismatch by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6697
[Bugfix] Fix ChatCompletion endpoint of mini_lb when stream is set by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6703
[CI] Fix setup of disaggregation with different tp by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6706
[PD] Remove Unnecessary Exception Handling for FastQueue.get() by @Hongbosherlock in https://github.com/sgl-project/sglang/pull/6712
Fuse routed_scaling_factor in DeepSeek by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6710
Overlap two kernels in DeepSeek with communication by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6711
Minor refactor two-batch overlap by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6682
Speed up when having padding tokens two-batch overlap by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6668
[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell by @Fridge003 in https://github.com/sgl-project/sglang/pull/6479
Fix LoRA bench by @Edenzzzz in https://github.com/sgl-project/sglang/pull/6719
Fix PP for Qwen3 MoE by @jinyouzhi in https://github.com/sgl-project/sglang/pull/6709
[feat] triton kernel for get_last_loc by @Alcanderian in https://github.com/sgl-project/sglang/pull/6676
[fix] more mem for draft_extend cuda_graph by @Alcanderian in https://github.com/sgl-project/sglang/pull/6726
[PD] bug fix: Update status if nixl receiver send a a dummy req. by @thesues in https://github.com/sgl-project/sglang/pull/6720
Tune memory arguments on B200 by @Fridge003 in https://github.com/sgl-project/sglang/pull/6718
Add DeepSeek-R1-0528 function call chat template by @Xu-Wenqing in https://github.com/sgl-project/sglang/pull/6725
refactor(tool call): Fix BaseFormatDetector tool_index issue and refactor parse_streaming_increment by @CatherineSue in https://github.com/sgl-project/sglang/pull/6715
Add draft extend CUDA graph for Triton backend by @ispobock in https://github.com/sgl-project/sglang/pull/6705
refactor apply_w8a8_block_fp8_linear in fp by @ChangyiYang in https://github.com/sgl-project/sglang/pull/6545
[PD] Support completion endpoint by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6729
Init PD Rust LB (PO2) by @hnyls2002 in https://github.com/sgl-project/sglang/pull/6437
Super tiny enable sole usage of expert distribution metrics and update doc by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6680
Support picking variants of EPLB algorithms by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6728
Support tuning DeepEP configs by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6742
[test] add ut and bm for get_last_loc by @Alcanderian in https://github.com/sgl-project/sglang/pull/6746
Fix mem_fraction_static for AMD CI by @Fridge003 in https://github.com/sgl-project/sglang/pull/6748
[fix][RL] Fix DeepSeekV3ForCausalLM.post_load_weights for multiple update weight by @zhuzilin in https://github.com/sgl-project/sglang/pull/6265
Improve EPLB logical to physical dispatch map by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6727
Update DeepSeek-R1-0528 function call chat template by @Xu-Wenqing in https://github.com/sgl-project/sglang/pull/6765
[PD] Optimize time out logic and add env var doc for mooncake by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6761
Fix aiohttp 'Chunk too big' in bench_serving by @guoyuhong in https://github.com/sgl-project/sglang/pull/6737
Support sliding window in triton backend by @NorthmanPKU in https://github.com/sgl-project/sglang/pull/6509
Fix shared experts fusion error by @lambert0312 in https://github.com/sgl-project/sglang/pull/6289
Fix one bug in the grouped-gemm triton kernel by @ch-wan in https://github.com/sgl-project/sglang/pull/6772
update llama4 chat template and pythonic parser by @upfixer in https://github.com/sgl-project/sglang/pull/6679
feat(tool call): Enhance Llama32Detector for improved JSON parsing in non-stream by @CatherineSue in https://github.com/sgl-project/sglang/pull/6784
Support token-level quantization for EP MoE by @ch-wan in https://github.com/sgl-project/sglang/pull/6782
Temporarily lower mmlu threshold for triton sliding window backend by @NorthmanPKU in https://github.com/sgl-project/sglang/pull/6785
ci: relax test_function_call_required by @CatherineSue in https://github.com/sgl-project/sglang/pull/6786
Add intel_amx backend for Radix Attention for CPU by @yanbing-j in https://github.com/sgl-project/sglang/pull/6408
Fix incorrect LoRA weight loading for fused gate_up_proj by @lifuhuang in https://github.com/sgl-project/sglang/pull/6734
fix(PD-disaggregation): Can not get local ip by @storyicon in https://github.com/sgl-project/sglang/pull/6792
[FIX] mmmu bench serving result display error (#6525) by @Arist12 in https://github.com/sgl-project/sglang/pull/6791
Bump torch to 2.7.0 by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/6788
chore: bump sgl-kernel v0.1.5 by @zhyncs in https://github.com/sgl-project/sglang/pull/6794
Improve profiler and integrate profiler in bench_one_batch_server by @merrymercy in https://github.com/sgl-project/sglang/pull/6787
chore: upgrade sgl-kernel v0.1.5 by @zhyncs in https://github.com/sgl-project/sglang/pull/6795
[Minor] Always append newline after image token when parsing chat message by @lifuhuang in https://github.com/sgl-project/sglang/pull/6797
Update CI tests for Llama4 models by @ravi03071991 in https://github.com/sgl-project/sglang/pull/6421
[Feat] Enable PDL automatically on Hopper architecture by @PopSoda2002 in https://github.com/sgl-project/sglang/pull/5981
chore: update blackwell docker by @zhyncs in https://github.com/sgl-project/sglang/pull/6800
misc: cache is_hopper_arch by @Edenzzzz in https://github.com/sgl-project/sglang/pull/6799
Remove contiguous before Flashinfer groupwise fp8 gemm by @Fridge003 in https://github.com/sgl-project/sglang/pull/6804
Correctly abort the failed grammar requests & Improve the handling of abort by @merrymercy in https://github.com/sgl-project/sglang/pull/6803
[EP] Add cuda kernel for moe_ep_pre_reorder by @yuan-luo in https://github.com/sgl-project/sglang/pull/6699
Add draft extend CUDA graph for flashinfer backend by @ispobock in https://github.com/sgl-project/sglang/pull/6805
Refactor CustomOp to avoid confusing bugs by @fzyzcjy in https://github.com/sgl-project/sglang/pull/5382
Tiny log prefill time by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6780
Tiny fix EPLB assertion about rebalancing period and recorder window size by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6813
Add simple utility to dump tensors for debugging by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6815
Fix profiles do not have consistent names by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6811
Speed up rebalancing when using non-static dispatch algorithms by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6812
[1/2] Add Kernel support for Cutlass based Fused FP4 MoE by @pavanimajety in https://github.com/sgl-project/sglang/pull/6093
[Router] Fix k8s Service Discovery by @YouNeedCryDear in https://github.com/sgl-project/sglang/pull/6766
Add CPU optimized kernels for topk and rope fusions by @jianan-gu in https://github.com/sgl-project/sglang/pull/6456
fix new_page_count_next_decode by @pansicheng in https://github.com/sgl-project/sglang/pull/6671
Fix wrong weight reference in dynamic EPLB by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6818
Minor add metrics to expert location updater by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6816
[Refactor] Rename n_share_experts_fusion as num_fused_shared_experts by @ch-wan in https://github.com/sgl-project/sglang/pull/6735
[FEAT] Add transformers backend support by @SunMarc in https://github.com/sgl-project/sglang/pull/5929
[fix] recover auto-dispatch for rmsnorm and rope by @Alcanderian in https://github.com/sgl-project/sglang/pull/6745
fix ep_moe_reorder kernel bugs by @BBuf in https://github.com/sgl-project/sglang/pull/6858
[Refactor] Multimodal data processing for VLM by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/6659
Decoder-only Scoring API by @chanh in https://github.com/sgl-project/sglang/pull/6460
feat: add dp-rank to KV events by @ishandhanani in https://github.com/sgl-project/sglang/pull/6852
Set num_fused_shared_experts as num_shared_experts when shared_experts fusion is not disabled by @ch-wan in https://github.com/sgl-project/sglang/pull/6736
Fix one missing arg in DeepEP by @ch-wan in https://github.com/sgl-project/sglang/pull/6878
Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. by @lifuhuang in https://github.com/sgl-project/sglang/pull/6861
support 1 shot allreduce in 1-node and 2-node using mscclpp by @zyksir in https://github.com/sgl-project/sglang/pull/6277
Fix Qwen3MoE missing token padding optimization by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6820
Tiny update error hints by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6846
Support layerwise rebalancing experts by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6851
Tiny allow profiler API to auto create directory by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6865
Support Blackwell DeepEP docker images by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6868
[EP] Add cuda kernel for moe_ep_post_reorder by @yuan-luo in https://github.com/sgl-project/sglang/pull/6837
Fix OpenAI Client error with single request via batch api by @ravi03071991 in https://github.com/sgl-project/sglang/pull/6170
[PD] Fix potential perf spike caused by tracker gc and optimize doc by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6764
Use deepgemm instead of triton for fused_qkv_a_proj_with_mqa by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6890
[CUTLASS-FP4-MOE] Introduce CutlassMoEParams class for easy initialization of Cutlass Grouped Gems Metadata by @pavanimajety in https://github.com/sgl-project/sglang/pull/6887
bugfix(OAI): Fix image_data processing for jinja chat templates by @CatherineSue in https://github.com/sgl-project/sglang/pull/6877
[CPU] enable CI for PRs, add Dockerfile and auto build task by @ZailiWang in https://github.com/sgl-project/sglang/pull/6458
AITER backend extension and workload optimizations by @HaiShaw in https://github.com/sgl-project/sglang/pull/6838
[Feature] Support Flashinfer fmha on Blackwell by @NorthmanPKU in https://github.com/sgl-project/sglang/pull/6930
Fix a bug in abort & Improve docstrings for abort by @merrymercy in https://github.com/sgl-project/sglang/pull/6931
Tiny support customize DeepEP max dispatch tokens per rank by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6934
Sync the changes on cuda graph runners by @merrymercy in https://github.com/sgl-project/sglang/pull/6932
[PD] Optimize transfer queue forward logic for dummy rank by @ShangmingCai in https://github.com/sgl-project/sglang/pull/6922
[Refactor] image data process in bench_serving by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/6879
[fix] logical_to_all_physical_map index 256 is out of bounds in EP parallel. by @MiterV1 in https://github.com/sgl-project/sglang/pull/6767
Add triton fused moe kernel config for E=257 on B200 by @Fridge003 in https://github.com/sgl-project/sglang/pull/6939
[sgl-kernel] update deepgemm by @Alcanderian in https://github.com/sgl-project/sglang/pull/6942
chore: bump sgl-kernel v0.1.6 by @zhyncs in https://github.com/sgl-project/sglang/pull/6943
Minor compile fused topk by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6944
[Bugfix] pipeline parallelism and Eagle Qwen2 by @Swipe4057 in https://github.com/sgl-project/sglang/pull/6910
Tiny re-introduce profile id logging by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6912
Add triton version as a fused_moe_triton config search key to avoid performace decrease in different Triton version by @BBuf in https://github.com/sgl-project/sglang/pull/5955
reduce torch.zeros overhead in moe align block size kernel by @BBuf in https://github.com/sgl-project/sglang/pull/6369
chore: upgrade sgl-kernel v0.1.6 by @Alcanderian in https://github.com/sgl-project/sglang/pull/6945
add fbgemm moe grouped gemm kernel benchmark by @BBuf in https://github.com/sgl-project/sglang/pull/6924
[Docker] Add docker file for SGL Router by @YouNeedCryDear in https://github.com/sgl-project/sglang/pull/6915
Disabling mixed chunked prefill when eagle is enabled by @Swipe4057 in https://github.com/sgl-project/sglang/pull/6874
Add canary for EPLB rebalancing by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6895
Refactor global_server_args_dict by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6866
Fuse routed scaling factor in topk_reduce kernel by @BBuf in https://github.com/sgl-project/sglang/pull/6220
Update server timeout time in AMD CI. by @saienduri in https://github.com/sgl-project/sglang/pull/6953
[misc] add is_cpu() by @Alcanderian in https://github.com/sgl-project/sglang/pull/6950
Add H20 fused MoE kernel tuning configs for DeepSeek-R1/V3 by @Xu-Wenqing in https://github.com/sgl-project/sglang/pull/6885
Add a CUDA kernel for fusing mapping and weighted sum for MoE. by @elfiegg in https://github.com/sgl-project/sglang/pull/6916
chore: bump sgl-kernel v0.1.6.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/6955
chore: upgrade sgl-kernel v0.1.6.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/6957
[DeepseekR1-FP4] Add Support for nvidia/DeepSeekR1-FP4 model by @pavanimajety in https://github.com/sgl-project/sglang/pull/6853
Revert "Fuse routed scaling factor in topk_reduce kernel (#6220)" by @zhyncs in https://github.com/sgl-project/sglang/pull/6968
[AMD] Add more tests to per-commit-amd by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/6926
chore: bump sgl-kernel v0.1.7 by @zhyncs in https://github.com/sgl-project/sglang/pull/6963
Slightly improve the sampler to skip unnecessary steps by @merrymercy in https://github.com/sgl-project/sglang/pull/6956
rebase h20 fused_moe config by @BBuf in https://github.com/sgl-project/sglang/pull/6966
Fix CI and triton moe Configs by @merrymercy in https://github.com/sgl-project/sglang/pull/6974
Remove unnecessary kernels of num_token_non_padded by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6965
Extend cuda graph capture bs for B200 by @Fridge003 in https://github.com/sgl-project/sglang/pull/6937
Fuse routed scaling factor in deepseek by @BBuf in https://github.com/sgl-project/sglang/pull/6970
Sync cuda graph runners by @merrymercy in https://github.com/sgl-project/sglang/pull/6976
Fix draft extend ut stability with flush cache by @ispobock in https://github.com/sgl-project/sglang/pull/6979
Fix triton sliding window test case by @merrymercy in https://github.com/sgl-project/sglang/pull/6981
Fix expert distribution dumping causes OOM by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6967
Minor remove one kernel for DeepSeek by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6977
[perf][sgl-kernel] extend cutlass_mla_decode to support num_head < 128 by @Alcanderian in https://github.com/sgl-project/sglang/pull/6929
Enable more unit tests for AMD CI. by @saienduri in https://github.com/sgl-project/sglang/pull/6983
Use torch.compile to fuse flash attention decode metadata preparation by @merrymercy in https://github.com/sgl-project/sglang/pull/6973
Speed up set_lora_info by eliminating unnecessary H2D transfers by @lifuhuang in https://github.com/sgl-project/sglang/pull/6960
support qwen3 emebedding by @Titan-p in https://github.com/sgl-project/sglang/pull/6990
Fix torch profiler bugs for bench_offline_throughput.py by @PanJason in https://github.com/sgl-project/sglang/pull/6557
chore: upgrade flashinfer v0.2.6.post1 jit by @zhyncs in https://github.com/sgl-project/sglang/pull/6958
cleanup tmp dir by @zhyncs in https://github.com/sgl-project/sglang/pull/7007
chore: update pr test xeon by @zhyncs in https://github.com/sgl-project/sglang/pull/7008
Fix cutlass MLA gets almost zero accuracy by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6998
Update amd nightly models CI. by @saienduri in https://github.com/sgl-project/sglang/pull/6992
feat: add direct routing strategy to DP worker by @ishandhanani in https://github.com/sgl-project/sglang/pull/6884
Fallback to lower triton version for unfound fused moe configs by @Fridge003 in https://github.com/sgl-project/sglang/pull/7013
Fix torchvision version for Blackwell by @Edenzzzz in https://github.com/sgl-project/sglang/pull/7015
Simplify prepare_extend_after_decode by @merrymercy in https://github.com/sgl-project/sglang/pull/6987
Migrate to assertEqual by @emmanuel-ferdman in https://github.com/sgl-project/sglang/pull/6741
Fix torch version in blackwell dockerfile by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/7017
chore: update pr test xeon by @zhyncs in https://github.com/sgl-project/sglang/pull/7018
Update default settings for blackwell by @Fridge003 in https://github.com/sgl-project/sglang/pull/7023
Support both approximate and exact expert distribution collection by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6964
Add decode req pool by @ByronHsu in https://github.com/sgl-project/sglang/pull/6980
[CI] Add CI workflow for sgl-router docker build by @YouNeedCryDear in https://github.com/sgl-project/sglang/pull/7027
Fix fused_moe triton configs by @yudian0504 in https://github.com/sgl-project/sglang/pull/7029
CPU: map changes from developing branch in sgl-kernel by @yanbing-j in https://github.com/sgl-project/sglang/pull/6833
chore: bump v0.4.7 by @zhyncs in https://github.com/sgl-project/sglang/pull/7038

New Contributors

@slin1237 made their first contribution in https://github.com/sgl-project/sglang/pull/5741
@GeLee-Q made their first contribution in https://github.com/sgl-project/sglang/pull/5850
@xutianyi1999 made their first contribution in https://github.com/sgl-project/sglang/pull/5823
@pengcuo made their first contribution in https://github.com/sgl-project/sglang/pull/5875
@johnnynunez made their first contribution in https://github.com/sgl-project/sglang/pull/5746
@zhjunqin made their first contribution in https://github.com/sgl-project/sglang/pull/5903
@xiaomin-D made their first contribution in https://github.com/sgl-project/sglang/pull/5350
@lifuhuang made their first contribution in https://github.com/sgl-project/sglang/pull/5969
@botieking98 made their first contribution in https://github.com/sgl-project/sglang/pull/3853
@ishandhanani made their first contribution in https://github.com/sgl-project/sglang/pull/6075
@zminglei made their first contribution in https://github.com/sgl-project/sglang/pull/6077
@Othame made their first contribution in https://github.com/sgl-project/sglang/pull/5764
@xu-yfei made their first contribution in https://github.com/sgl-project/sglang/pull/5822
@Simon-Li made their first contribution in https://github.com/sgl-project/sglang/pull/6016
@shimizust made their first contribution in https://github.com/sgl-project/sglang/pull/4701
@applesaucethebun made their first contribution in https://github.com/sgl-project/sglang/pull/6179
@emmanuel-ferdman made their first contribution in https://github.com/sgl-project/sglang/pull/6180
@KivenChen made their first contribution in https://github.com/sgl-project/sglang/pull/5084
@blzheng made their first contribution in https://github.com/sgl-project/sglang/pull/6115
@zhuzilin made their first contribution in https://github.com/sgl-project/sglang/pull/5914
@quinnrong94 made their first contribution in https://github.com/sgl-project/sglang/pull/6109
@yiliu30 made their first contribution in https://github.com/sgl-project/sglang/pull/6317
@ysulsky made their first contribution in https://github.com/sgl-project/sglang/pull/6136
@doujiang24 made their first contribution in https://github.com/sgl-project/sglang/pull/6392
@wangxiyu191 made their first contribution in https://github.com/sgl-project/sglang/pull/6323
@yanbing-j made their first contribution in https://github.com/sgl-project/sglang/pull/6405
@guapisolo made their first contribution in https://github.com/sgl-project/sglang/pull/5339
@MiterV1 made their first contribution in https://github.com/sgl-project/sglang/pull/6524
@upfixer made their first contribution in https://github.com/sgl-project/sglang/pull/6537
@habaohaba made their first contribution in https://github.com/sgl-project/sglang/pull/6450
@jinyouzhi made their first contribution in https://github.com/sgl-project/sglang/pull/6709
@thesues made their first contribution in https://github.com/sgl-project/sglang/pull/6720
@ChangyiYang made their first contribution in https://github.com/sgl-project/sglang/pull/6545
@NorthmanPKU made their first contribution in https://github.com/sgl-project/sglang/pull/6509
@storyicon made their first contribution in https://github.com/sgl-project/sglang/pull/6792
@Arist12 made their first contribution in https://github.com/sgl-project/sglang/pull/6791
@pavanimajety made their first contribution in https://github.com/sgl-project/sglang/pull/6093
@YouNeedCryDear made their first contribution in https://github.com/sgl-project/sglang/pull/6766
@jianan-gu made their first contribution in https://github.com/sgl-project/sglang/pull/6456
@pansicheng made their first contribution in https://github.com/sgl-project/sglang/pull/6671
@SunMarc made their first contribution in https://github.com/sgl-project/sglang/pull/5929
@chanh made their first contribution in https://github.com/sgl-project/sglang/pull/6460
@zyksir made their first contribution in https://github.com/sgl-project/sglang/pull/6277
@ZailiWang made their first contribution in https://github.com/sgl-project/sglang/pull/6458

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.4.6...v0.4.7

Source: README.md, updated 2025-06-10

SGLang Files

SGLang is a fast serving framework for large language models

Highlights

What's Changed

New Contributors

SGLang Files

SGLang is a fast serving framework for large language models

Get an email when there's a new version of SGLang

Highlights

What's Changed

New Contributors