The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2026-02-23	107.1 kB	1
v0.5.9 source code.tar.gz	2026-02-23	9.6 MB	0
v0.5.9 source code.zip	2026-02-23	12.1 MB	0
Totals: 3 Items		21.8 MB	1

Highlights

LoRA Weight Loading Overlap with Computation: Overlap LoRA weight loading with computation during inference, reducing TTFT by ~78% and TPOT by ~34.88% on large adaptors: [#15512]
TRT-LLM NSA Kernel Integration for DeepSeek V3.2: Integrate TRT-LLM DSA kernels for Native Sparse Attention, boosting DeepSeek V3.2 performance by 3x-5x on Blackwell platforms with trtllm for both --nsa-prefill-backend and --nsa-decode-backend (with minor accuracy drop): [#16758], [#17662], [#18389]
Flashinfer All-to-All MoE Dispatcher: Add the Flashinfer all-to-all MoE dispatcher for efficient expert parallelism communication, enabling optimized routing in MoE models: [#14668]
FA4 (FP4 Attention) Support for Multimodal Encoder: Introduce FP4 attention backend and variable-length attention function for multimodal encoders, enabling lower-precision inference for vision-language models: [#13539]
Anthropic Compatible API Endpoint: Add native Anthropic API compatibility to SGLang, allowing direct integration with tools and clients built for the Anthropic API format: [#18630]
SGLang-Diffusion Advanced Optimizations: Production-ready improvements including token-level sequence sharding, parallel VAE decoding, fused kernels, Nunchaku and FP8 support, and multiple new models in the ComfyUI plugin: blog
Spec V2 Critical bug fix: Fix out-of-index bug caused by torch garbage collection in speculative decoding v2, improving reliability of speculative verification: [#18958]
Deploying DeepSeek on GB300 NVL72: Optimization work for long-context inference using prefill-decode disaggregation and other SGLang features on NVIDIA's latest GB300 platform: blog
Bump AITER version to 0.1.10.post3: Support FP8 Prefill/Decode/KV Cache
Commit-to-Version Lookup in docs.sglang.io: Easily find the earliest official version that includes a given PR or commit, streamlining release tracking for users and developers: [#18450]

New Model Support

Kimi-K2.5: [#17789], cookbook
GLM-5: cookbook (still requires a custom docker for transformers upgrade, will follow up with a rc release since transformers upgrade is risky)
Qwen 3.5: [#18489], [#18926], [#18937], cookbook
MiniMax 2.5: cookbook
Ernie4.5-VL: [#15679]
Step3-VL: [#17513]
Step-3.5-Flash: [#18084], cookbook
LLaDA 2.1: cookbook
Ring 2.5 1T / Ling 2.5 1T: [#18598], cookbook, cookbook
MOVA (Diffusion): [#17704]
GLM-OCR: [#17582], cookbook
DeepSeek-OCR-2: [#17897]

SGLang-Diffusion

Support multiple new models in ComfyUI Plugin
Parallel Folding and Parallel VAE Decoding for faster image/video generation
Nunchaku and FP8 support for diffusion models
Sequence Sharding (token-level) replacing Frame Sharding for improved efficiency
LTX-2 support: [#17495], [#17496]
MOVA model support: [#17704]
Cache-DiT optimizations and fused kernel improvements
Numerous bug fixes and refactors across the diffusion pipeline

Performance

Integrate TRT-LLM NSA kernels with up to 3-5x speedup on Blackwell: [#16758], [#17662], [#18389]
LoRA weight loading overlap reducing TTFT by ~78%: [#15512]
Flashinfer all-to-all MoE dispatcher: [#14668]
FA4 for multimodal encoder: [#13539]
Optimize GDN decode for Qwen3 Next: [#17094]
Tune fused MoE kernels for Llama-4-Scout, MiniMax M2: [#17891], [#18851], [#18833]
Symmetric memory pre-allocation to avoid fragmentation: [#17089]
Optimize fused_moe triton kernel TMA: [#18782]
Fused triton kernel for Ernie4.5-VL rotary embedding: [#18856]
Support MxINT4 Flashinfer TRT-LLM MoE GEMM: [#16892]
AITER bias MoE support for GPT-OSS MxFP4: [#17735]

Prefill-Decode Disaggregation

Support KV transfer with MORI-IO: [#14626]
Mooncake intra-node NVLink KV transfer: [#17866]
Improve KV offset calculation for MHA model with different TP size: [#18163]
Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL: [#18259]

Diffusion LLM (dLLM)

Remove cuda graph batch size limitation: [#17458]
JointThreshold algorithm for joint M2T and T2T decoding: [#18171]
Basic dLLM scheduling strategy and implementation: [#17484]

Speculative Decoding

Fix out-of-index bug caused by torch garbage collection in Spec V2: [#18958]
Move forward timeout before verify to fix Eagle v1 filter mismatch: [#18760]

Dependencies

Flashinfer updated to 0.6.3: [#17700]
AITER updated to 0.1.10.post3: [#18741]
Mooncake transfer engine updated to 0.3.9: [#18316]

AMD Hardware

AITER updated to v0.1.10.post3 with FP8 Prefill, FP8 Decode, FP8 KV Cache support
ROCm 7 standardization and ROCm 6.3 deprecation: [#17785]
Kimi K2.5 Day 0 ROCm support: [#17863]
FP8 prefill attention kernel integration: [#18528]
Two-batch overlapping for MORI EP: [#17953]
DeepSeek V3.2 and Kimi-K2 nightly CI tests: [#17523]

NPU/Ascend

Support for MiniCPM3-4B: [#16866]
Qwen 3.5 support on Ascend: [#18544]
Accuracy improvements for StableLM-2: [#17470]
Bug fixes for DeepSeek V3.2 and DeepSeek-VL2: [#17007]

CPU Backend

Optimize Qwen3-Next model on CPU: [#12525]
Optimize flash_attn_varlen_func: [#15708]
Add INT4 kernels for CPU: [#8226]

Kernel Slimming

Migrate GPTQ-Marlin repack kernel to JIT: [#18543]
Migrate AWQ Marlin repack kernel to JIT: [#18949]

Documentation

Add RL documentation: [#17663]
Update torch compile description: [#17819]
Refine spec decode docs for SpecV2/STANDALONE/NGRAM: [#18321]
Consolidate diffusion documentation: [#18095]

What's Changed

Update test README with CI registry documentation and 5090/H100 guidance by @alisonshao in https://github.com/sgl-project/sglang/pull/17368
update dependence docs of npu by @amote-i in https://github.com/sgl-project/sglang/pull/17573
[AMD] CI - migrate perf test and fix stage-b-test-1-gpu-amd by @yctseng0211 in https://github.com/sgl-project/sglang/pull/17340
Skip mm feature pool init to avoid EPD OOM by @liusy58 in https://github.com/sgl-project/sglang/pull/16388
Update mamba env setting by @ispobock in https://github.com/sgl-project/sglang/pull/17566
[NPU]bugfix: fix for dsv3.2 and dsvl2 by @JiaruiChang5268 in https://github.com/sgl-project/sglang/pull/17007
[AMD CI] Add 2-GPU sgl-kernel Tests by @bingxche in https://github.com/sgl-project/sglang/pull/17555
Lazy import torchao by @merrymercy in https://github.com/sgl-project/sglang/pull/17626
Re-enable unit-test-deepep-8-gpu and unit-test-backend-4-gpu-gb200 by @alisonshao in https://github.com/sgl-project/sglang/pull/17438
fix gpt-oss launch failure with piecewise cuda graph by @zminglei in https://github.com/sgl-project/sglang/pull/17532
[NPU] [CI] temporarily disable mtp test by @iforgetmyname in https://github.com/sgl-project/sglang/pull/17614
[NPU] update doc for Ascend NPU by @Hexq0210 in https://github.com/sgl-project/sglang/pull/17621
turn off dit_layerwise_offload for wan on rocm by @zyzshishui in https://github.com/sgl-project/sglang/pull/17569
set cooldown_interval_minutes to 0 for liusy58 by @liusy58 in https://github.com/sgl-project/sglang/pull/17637
Support symmetric memory pre-allocation to avoid fragmentation by @nvcastet in https://github.com/sgl-project/sglang/pull/17089
[DeepSeek V3.2] Enable trtllm NSA with bf16 kvcache by @akhilg-nv in https://github.com/sgl-project/sglang/pull/16758
add the fa4 mm backend and varlen func by @vincentzed in https://github.com/sgl-project/sglang/pull/13539
[Refactor] Algebraic data type for nextn config + some basic refactors by @xyjixyjixyji in https://github.com/sgl-project/sglang/pull/17347
[DLLM] Remove cuda graph batch size limitation by @btw616 in https://github.com/sgl-project/sglang/pull/17458
Add return routed experts to the completions and chat/completions endpoints by @mansoor-s in https://github.com/sgl-project/sglang/pull/17434
[MUSA][1/N] sglang.check_env by @yeahdongcn in https://github.com/sgl-project/sglang/pull/16959
[MUSA][2/N] sgl-kernel build by @yeahdongcn in https://github.com/sgl-project/sglang/pull/17053
fix post_residual_addition more generally by @nanjiangwill in https://github.com/sgl-project/sglang/pull/17286
feature: adding openai compatible API request to bench_serving by @dougyster in https://github.com/sgl-project/sglang/pull/17219
[NPU]support model MiniCPM3-4B for npu by @McZyWu in https://github.com/sgl-project/sglang/pull/16866
[NPU] solve accuracy problem for stablelm-2-1-6b for npu by @McZyWu in https://github.com/sgl-project/sglang/pull/17470
[Docker] Install cudnn==9.16 for cuda 13 image to avoid check error by @Fridge003 in https://github.com/sgl-project/sglang/pull/17668
Refactor: Extract DeepSeek common utilities into shared module by @DotSlash-A in https://github.com/sgl-project/sglang/pull/16969
[Diffusion] LTX-2 Support PR1 by @gmixiaojin in https://github.com/sgl-project/sglang/pull/17495
[Diffusion] LTX-2 Support PR2 by @gmixiaojin in https://github.com/sgl-project/sglang/pull/17496
fix: nightly wheel naming for non-post versions by @dougyster in https://github.com/sgl-project/sglang/pull/17538
[JIT Kernel]Add Some CUDA Runtime API Wrapper for JIT Kernel Header by @HydraQYH in https://github.com/sgl-project/sglang/pull/17588
Fix: mistake sigmoid in kda by @strgrb in https://github.com/sgl-project/sglang/pull/17508
Use attn tp group in embedding for more models by @ispobock in https://github.com/sgl-project/sglang/pull/17570
[Diffusion] Add diffusion time embedding to jit kernel by @BBuf in https://github.com/sgl-project/sglang/pull/17658
Move fa4 from sgl-kernel to jit kernel by @BBuf in https://github.com/sgl-project/sglang/pull/17353
add documentation example for LoRA overlap loading and cleanup unused function by @glenliu21 in https://github.com/sgl-project/sglang/pull/17464
[Bugfix] fix TypeError when log-requests-level >=2 in prefill node warmup by @yunkchen in https://github.com/sgl-project/sglang/pull/17129
[Kimi-Linear] Refactor Kimi-Linear to support RadixLinearAttention by @yuan-luo in https://github.com/sgl-project/sglang/pull/17506
[NPU] torch_npu profiler tensorboard path type fix by @mengchengTang in https://github.com/sgl-project/sglang/pull/17545
[NVIDIA] Add flashinfer all-to-all MOE dispatcher by @trevor-m in https://github.com/sgl-project/sglang/pull/14668
Fix test timeout issue in pr-test by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17681
Fix NSA indexer test and move it to pre commit test by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17682
Temporarily disable lora overlap loading test due to flakiness by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17683
fix: Refactor register_image_processor to use kwarg instead of positional arg by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/17685
[diffusion]: Fix ZImage SP sharding for caption and latent by @dutsc in https://github.com/sgl-project/sglang/pull/17301
Fix slash command handler trigger condition by trimming the comments by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17691
Add PyTorch .bin file validation to CI weight validation by @alisonshao in https://github.com/sgl-project/sglang/pull/17533
[DeepSeek-V3.2] Fix TRT-LLM NSA in target_verify/draft_extend by @mmangkad in https://github.com/sgl-project/sglang/pull/17662
Fix swa memory pool size with spec by @ispobock in https://github.com/sgl-project/sglang/pull/17630
[Refactore] [CI] Remove redundant CI test runs step 2 by @Makcum888e in https://github.com/sgl-project/sglang/pull/17584
revert row from https://github.com/sgl-project/sglang/pull/17584/ by @Makcum888e in https://github.com/sgl-project/sglang/pull/17701
[Refactor] Use is_in_ci() utility in JIT kernel benchmarks by @luke396 in https://github.com/sgl-project/sglang/pull/17118
use published reasoning parser crate by @slin1237 in https://github.com/sgl-project/sglang/pull/17709
update to use official openai protocol crate by @slin1237 in https://github.com/sgl-project/sglang/pull/17710
remove self managed protocols as it has been replaced with official oai spec by @slin1237 in https://github.com/sgl-project/sglang/pull/17711
[diffusion] refactor: remove useless lazy-import cache-dit codes by @mickqian in https://github.com/sgl-project/sglang/pull/17659
Support mxint4 flashinfer_trtllm moe gemm by @HandH1998 in https://github.com/sgl-project/sglang/pull/16892
A few updates to the night tests by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17694
Add an all type in pyproject.tml to include diffusion support by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17697
Extend b200 kernel tests timeout for CPU differences by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17718
[misc] remove tool parser and tree benchmark as they are not meaningful atm by @slin1237 in https://github.com/sgl-project/sglang/pull/17719
[misc] replace existing tool call code with new crate package by @slin1237 in https://github.com/sgl-project/sglang/pull/17720
Upload nightly test metrics to GH artifacts by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17696
Fix flaky streaming logprobs test by handling detokenizer text buffering by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17687
[Bugfix]Repeated add modelslim quant_config and bugfix with "enable-piecewise-cuda-graph" on NPU by @chenxu214 in https://github.com/sgl-project/sglang/pull/17511
Fix sgl-kernel install: fail instead of PyPI fallback when artifacts missing by @alisonshao in https://github.com/sgl-project/sglang/pull/17728
Add EP=2 to qwen235b nightly tests by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17738
Update nightly-test-nvidia.yml to remove push trigger by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17625
remove self managed mcp as it has been replaced with official rmcp crate by @slin1237 in https://github.com/sgl-project/sglang/pull/17740
[Kimi-Linear] Remove duplicated code in kimi-linear by @yuan-luo in https://github.com/sgl-project/sglang/pull/17731
[NIXL] Add custom NIXL backend selection for KVManager by @zackyoray in https://github.com/sgl-project/sglang/pull/17146
Merge performance/accuracy test suites into regular stage-b suites by @alisonshao in https://github.com/sgl-project/sglang/pull/17609
remove self managed wasm as it has been replaced with official smg wa… by @slin1237 in https://github.com/sgl-project/sglang/pull/17746
Exclude some diffusion package for ARM in docker release by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17745
update wasm endpoint by @slin1237 in https://github.com/sgl-project/sglang/pull/17748
[Fix] Pass missing backend argument in pipelines_core initialization by @Prozac614 in https://github.com/sgl-project/sglang/pull/17343
remove multimodal as this is completely dead code by @slin1237 in https://github.com/sgl-project/sglang/pull/17750
accuracy enhancement for baichuan2-13B for npu by @McZyWu in https://github.com/sgl-project/sglang/pull/16868
Bump FI version by @shaharmor98 in https://github.com/sgl-project/sglang/pull/17700
refactor mamba radix cache logic in server_args by @yizhang2077 in https://github.com/sgl-project/sglang/pull/17645
[AMD CI] Add moonshotai/Kimi-K2-Instruct-0905 testcases by @sogalin in https://github.com/sgl-project/sglang/pull/17656
[NPU]DeepSeek-V3.2 support npu mlaprolog by @lawtherWu in https://github.com/sgl-project/sglang/pull/15381
Add test_gpt_oss_4gpu.py to B200 test suite by @alisonshao in https://github.com/sgl-project/sglang/pull/17743
fix: move nightly whl to cuda version folder by @dougyster in https://github.com/sgl-project/sglang/pull/17762
[NPU] Split pyproject npu from pyproject other by @Makcum888e in https://github.com/sgl-project/sglang/pull/17641
Special logic for healthcheck by @whybeyoung in https://github.com/sgl-project/sglang/pull/17734
[Docs] Add RL documentation by @zijiexia in https://github.com/sgl-project/sglang/pull/17663
fix(processor): support InternS1 text_config in InternVL processor by @Mahdi-CV in https://github.com/sgl-project/sglang/pull/17040
[bugfix] Internal processing of hf3fs crash # 16614 by @leihuang-sketch in https://github.com/sgl-project/sglang/pull/16938
[diffusion] Support Qwen-Image, Multi-GPU Z-Image, and Enhanced ComfyUI Integration by @niehen6174 in https://github.com/sgl-project/sglang/pull/17678
Support Kimi-K2.5 model by @yhyang201 in https://github.com/sgl-project/sglang/pull/17789
[HiCache][HA 1/N] Support HiCache storage runtime attach/detach by @alphabetc1 in https://github.com/sgl-project/sglang/pull/15892
fix: preserve disconnect events in api key middleware by @alphabetc1 in https://github.com/sgl-project/sglang/pull/17253
[AMD] Update dsv3.2 AMD GPU docs and unify ROCm TileLang build by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/17783
[Bug Fix] Fix reasoning parser when continue_final_message=true by @laixinn in https://github.com/sgl-project/sglang/pull/17065
[GLM-OCR] Support GLM-OCR Model by @zRzRzRzRzRzRzR in https://github.com/sgl-project/sglang/pull/17582
fix(quantization): add sgl_kernel fallback for FP4 quantize on Blackwell GPUs by @MikkoParkkola in https://github.com/sgl-project/sglang/pull/17816
[Doc] Update description on torch compile by @Fridge003 in https://github.com/sgl-project/sglang/pull/17819
[NPU] Adapt cann 8.5: use sfa and lightning indexer op from cann and CI update by @monkeyLoveding in https://github.com/sgl-project/sglang/pull/17615
[DeepSeek] Update tests and document for DeepSeek V3.2 NVFP4 checkpoint by @Fridge003 in https://github.com/sgl-project/sglang/pull/17657
[Diffusion] dit-precision refactor by @fsygd in https://github.com/sgl-project/sglang/pull/17751
Make flashMLA work on: Cu13, B300 by @vincentzed in https://github.com/sgl-project/sglang/pull/17600
[hybrid-model] clean up and consolidate redundant fields in RadixLinearAttention by @zminglei in https://github.com/sgl-project/sglang/pull/17660
Pass GPU ids to kill specified devices in script. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/17840
[AMD] Deprecate ROCm 6.3 artifacts and standardize gfx942 on ROCm 7 by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/17785
[Diffusion] glm-image apply flashinfer rope by @BBuf in https://github.com/sgl-project/sglang/pull/17689
[diffusion] fix: fix suppressing error log on non-main ranks by @mickqian in https://github.com/sgl-project/sglang/pull/17712
[diffusion] feat: add an arg for controlling the number of prefetched layers in Layerwise-offload by @mickqian in https://github.com/sgl-project/sglang/pull/17693
[diffusion] Fix vertex generate by @yashikagandhi-google in https://github.com/sgl-project/sglang/pull/17611
fix: add bias when enable mm fallback variant by @gongyisheng in https://github.com/sgl-project/sglang/pull/17690
[AMD] CI - enable deepseekv3.2 on MI325-8gpu and merge perf/accuracy test suites into stage-b suites by @yctseng0211 in https://github.com/sgl-project/sglang/pull/17633
[DSv32] Overlap indexer qk projection and activation quant by @zianglih in https://github.com/sgl-project/sglang/pull/17688
[Diffusion] Delete sgl-kernel outdated time_embedding kernel by @BBuf in https://github.com/sgl-project/sglang/pull/17278
Add a performance dashboard server and frontend for nightly CUDA tests by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17725
[diffusion] doc: fix wrong docker run command by @mickqian in https://github.com/sgl-project/sglang/pull/17856
[JIT kernel] Update jit_kernel cache and develop doc by @BBuf in https://github.com/sgl-project/sglang/pull/17842
[AMD] Add Kimi-K2, DeepSeek-V3.2 tests to nightly CI by @michaelzhang-ai in https://github.com/sgl-project/sglang/pull/17523
[diffusion] comfyui: fix import typo by @triple-mu in https://github.com/sgl-project/sglang/pull/17834
[AMD][Kimi K2.5 Day 0] ROCm: route W4A16 MoE to Triton and fix packed-weight loading by @jhinpan in https://github.com/sgl-project/sglang/pull/17863
[MUSA][7/N] Enhance CUDA / PyNccl wrapper to support MTLink connectivity detection by @gingerXue in https://github.com/sgl-project/sglang/pull/17499
[Perf] Tune Llama-4-Scout-17B-16E-Instruct fused moe kernel by @zhendonghua in https://github.com/sgl-project/sglang/pull/17891
Make the functions in logits_processor.py and sampler.py more modular by @merrymercy in https://github.com/sgl-project/sglang/pull/17885
[Diffusion] Support MOVA model by @CloudRipple in https://github.com/sgl-project/sglang/pull/17704
[JIT Kernel]Support fused_add_rmsnorm in JIT Kernel by @HydraQYH in https://github.com/sgl-project/sglang/pull/17677
[Fix][trtllm-mha] Canonicalize the strides when num_head = 1 by @xyjixyjixyji in https://github.com/sgl-project/sglang/pull/17732
Integration mori backend for EP a2a data communication by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/17012
feat: add custom request header logging by @joearedmond in https://github.com/sgl-project/sglang/pull/17786
update ascend docs by @amote-i in https://github.com/sgl-project/sglang/pull/17741
[FIX] kimi_k2 reasoning parser by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/17901
Fix flaky tool calls in the Kimi K2.5 model by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/17914
[MUSA][4/N] Add common device utilities, distributed backend, and custom op wiring by @yeahdongcn in https://github.com/sgl-project/sglang/pull/17246
[PD] Support KV transfer with MORI-IO by @maning00 in https://github.com/sgl-project/sglang/pull/14626
[Diffusion][MOVA] fix: resolve library mismatch in scheduler and update dit offload method name by @CloudRipple in https://github.com/sgl-project/sglang/pull/17916
[diffusion] model: move tp_rmsnorm check to WanTransformerBlock by @triple-mu in https://github.com/sgl-project/sglang/pull/17792
Add aiter bias moe support in gpt-oss mxfp4 model by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/17735
[diffusion]: align sglang diffusion AMD pyproject_other.toml diffusion dependency with pyproject.toml by @ZiguanWang in https://github.com/sgl-project/sglang/pull/16225
[wip] sync with upstream zImage by @yhyang201 in https://github.com/sgl-project/sglang/pull/17822
Add mxfp8 support for online quantization, Triton dense linear, and CUTLASS MoE by @zianglih in https://github.com/sgl-project/sglang/pull/17449
Support LightOnOCR-2-1B by @shvmjndl in https://github.com/sgl-project/sglang/pull/17806
[diffusion]: add dummy device attribute to fix AttributeError by @Ratish1 in https://github.com/sgl-project/sglang/pull/17949
Add tool call tests for DeepSeek V3.2 in nightly CI by @harvenstar in https://github.com/sgl-project/sglang/pull/17951
[MUSA] Add labeler config by @yeahdongcn in https://github.com/sgl-project/sglang/pull/17923
Fix torch.__version__ for PEP440 by @EduardDurech in https://github.com/sgl-project/sglang/pull/15682
Fix capture_sizes range for pcg by @ch-wan in https://github.com/sgl-project/sglang/pull/17956
Fix logprob_start_len handling for prefill-only requests by @ch-wan in https://github.com/sgl-project/sglang/pull/17395
feat: add forward timeout by @zhooooong in https://github.com/sgl-project/sglang/pull/17831
[AMD] fix pip sglang version by @yctseng0211 in https://github.com/sgl-project/sglang/pull/17950
Add concurrency tracking to runner utilization report by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17963
Support DeepSeek-OCR-2 in SGLang (OCR2 vision pipeline, tokenization alignment, and weight loading fixes)#17833 by @baonudesifeizhai in https://github.com/sgl-project/sglang/pull/17897
add weightless qk norm to RMSNorm interface for Llama 4 by @b8zhong in https://github.com/sgl-project/sglang/pull/12813
GPTJForCausalLM Support by @wenchen76 in https://github.com/sgl-project/sglang/pull/7839
[Fix] Remove unused Type import in gpt_j.py by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17975
Fix the scenario where eh_proj is quantized in the bailing moe nextn weights by @LHXuuu in https://github.com/sgl-project/sglang/pull/17808
[Intel GPU] fix device in DeepseekScalingRotaryEmbedding to run DeepSeek-V2-Lite BF16 on XPU by @polisettyvarma in https://github.com/sgl-project/sglang/pull/10021
Fix prefill latency performance drop of bench serving by @gaopengff in https://github.com/sgl-project/sglang/pull/14592
[Intel GPU] fix import error to run DeepSeek-V2-Lite model with BF16 on XPU by @polisettyvarma in https://github.com/sgl-project/sglang/pull/10858
[CPU] Optimize Qwen3-next model on CPU by @jianan-gu in https://github.com/sgl-project/sglang/pull/12525
[CPU] optimize flash_attn_varlen_func by @mingfeima in https://github.com/sgl-project/sglang/pull/15708
[CPU][INT4] Add INT4 kernels for CPU by @jianan-gu in https://github.com/sgl-project/sglang/pull/8226
fix(benchmark): add missing args for speculative decoding benchmark by @cswuyg in https://github.com/sgl-project/sglang/pull/17974
[NPU] enhance accuracy for model kimi-vl-a3b-instruct by @McZyWu in https://github.com/sgl-project/sglang/pull/17480
adapt MODELSCOPE download by @Hide-on-bushsh in https://github.com/sgl-project/sglang/pull/17922
Increase install dependency timeout for gb200 by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17977
SGLang Tracing: Improve root span attributes by @zhanghaotong in https://github.com/sgl-project/sglang/pull/17008
Add cuda graph status to prefill log by @ispobock in https://github.com/sgl-project/sglang/pull/17836
Fix SHM pointer re-serialization in DP attention. by @FlamingoPg in https://github.com/sgl-project/sglang/pull/17930
update npu docs by @amote-i in https://github.com/sgl-project/sglang/pull/17987
[Model] Add K-EXAONE model support by @xvyaward in https://github.com/sgl-project/sglang/pull/16294
[BUGFIX] Fix dp size > 1 for qwen3 vl model by @zju-stu-lizheng in https://github.com/sgl-project/sglang/pull/17624
[Diffusion] Fix lora default lora_scale bug by @BBuf in https://github.com/sgl-project/sglang/pull/17982
Optimize GDN decode for Qwen3 Next by @samuellees in https://github.com/sgl-project/sglang/pull/17094
[BugFix] Fix server crashes when req.grammar and ngram spec are enabled by @SYChen123 in https://github.com/sgl-project/sglang/pull/17585
[NPU] support llama-3.2-11B-vision-instruct mode for NPU by @JiaruiChang5268 in https://github.com/sgl-project/sglang/pull/17492
[sglang] fix mm token padded value overlap with text token id by @bixue2010 in https://github.com/sgl-project/sglang/pull/17781
doc update for CANN version by @wangtiance in https://github.com/sgl-project/sglang/pull/18014
[NPU] fix sgl-kernel-npu package url error in npu.Dockerfile by @22dimensions in https://github.com/sgl-project/sglang/pull/18017
Add ROCm + Mori docker build instructions in rocm.Dockerfile by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/18018
[Diffusion] Fix FLUX.1-schnell time embedding argument mismatch by @BBuf in https://github.com/sgl-project/sglang/pull/17988
Fix cuBLAS >=12.9 detection for cu12/cu13 package naming by @mmangkad in https://github.com/sgl-project/sglang/pull/17766
Fix .gitignore may ignore files like core_attention.py by @yeahdongcn in https://github.com/sgl-project/sglang/pull/18021
【docs】【NPU】Update Expert Parallelism docs for Ascend NPU by @husf1130 in https://github.com/sgl-project/sglang/pull/17940
add reasoning_tokens usage test for tool call by @harvenstar in https://github.com/sgl-project/sglang/pull/18022
Reduce topk kernel shared memory from 128KB to 32KB for better occupancy by @hammersam in https://github.com/sgl-project/sglang/pull/17747
Fix OOM in DeepSeek weight loading by deferring dict(weights) materialization by @hsuchifeng in https://github.com/sgl-project/sglang/pull/17744
[EPD][Perf] parallelize ZMQ send for encode server by @ZhengWG in https://github.com/sgl-project/sglang/pull/16487
[Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB by @b8zhong in https://github.com/sgl-project/sglang/pull/17965
Add launch_command assignment in crash dump by @merrymercy in https://github.com/sgl-project/sglang/pull/17967
[diffusion] refactor: split component_loader into component-wise files by @mickqian in https://github.com/sgl-project/sglang/pull/17820
[Fix] Revert back to using CUTLASS mm_fp4 backend by @b8zhong in https://github.com/sgl-project/sglang/pull/17369
[MUSA] Update 3rd party dir to build/_deps by @yeahdongcn in https://github.com/sgl-project/sglang/pull/18035
[CPU] toml file update by @ZailiWang in https://github.com/sgl-project/sglang/pull/17861
Update python/sglang/README.md by @haojin2 in https://github.com/sgl-project/sglang/pull/18045
[Performance] Optimize Mllama LayerNorm -> Upd by @vincentzed in https://github.com/sgl-project/sglang/pull/9725
Fix: Remove duplicate assignment for use_w4afp8 by @tianchongchong in https://github.com/sgl-project/sglang/pull/17858
[Perf] Add Flashinfer DeepGEMM SM90 for SwapAB Optimization by @b8zhong in https://github.com/sgl-project/sglang/pull/15514
feat: validate ib devices in server args by @acelyc111 in https://github.com/sgl-project/sglang/pull/17598
Improve error output in tnightly tets by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18053
Skipped warning on sm100 by @mattteochen in https://github.com/sgl-project/sglang/pull/18000
Fix rerun stage command with merged commit history by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/17960
[BugFix] Fix draft model specified config file by @khalil2ji3mp6 in https://github.com/sgl-project/sglang/pull/17815
Set torch url index in pyproject.toml by @Fridge003 in https://github.com/sgl-project/sglang/pull/16802
[metric] Optional extra metric labels by @yinghai in https://github.com/sgl-project/sglang/pull/18049
[BugFix] fix gpt-oss accuracy issue when enabling piecewise cuda graph by @zminglei in https://github.com/sgl-project/sglang/pull/18013
Fix swa kv cache memory allocation by @ispobock in https://github.com/sgl-project/sglang/pull/18039
Disable test_mla_int8_deepseek_v3.py temporarily by @alisonshao in https://github.com/sgl-project/sglang/pull/18057
Migrate 4-GPU/8-GPU workflow jobs to stage-c and add CI registry decorators by @alisonshao in https://github.com/sgl-project/sglang/pull/17299
Fix installation script for H200 runners by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18050
[Bugfix] fix the display error (inconsistent context) by @lingebeng in https://github.com/sgl-project/sglang/pull/17699
[model] Support MiniCPM-V 4.5 by @tc-mb in https://github.com/sgl-project/sglang/pull/9610
Fix Diffusion Request Validation to allow missing input artifacts if the input only contains text by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/16610
Optimizing all_reduce in RMSNormTP in minimax_m2 by @rogeryoungh in https://github.com/sgl-project/sglang/pull/16483
Fix CUDA 12 dependency when importing Mooncake in official CUDA 13.x image by @ZhenshengWu in https://github.com/sgl-project/sglang/pull/17540
[Feature] Support file:// URL format for multimodal inputs by @ppraneth in https://github.com/sgl-project/sglang/pull/14490
support qwen3-next eagle3 by @sleepcoo in https://github.com/sgl-project/sglang/pull/14607
feat: Add Ling Flash v2.0 support for Eagle3 by @yefei12 in https://github.com/sgl-project/sglang/pull/15119
Move deleted 8-GPU tests to test/manual/ by @alisonshao in https://github.com/sgl-project/sglang/pull/18060
Reset evict swa status when retract by @ispobock in https://github.com/sgl-project/sglang/pull/18059
[NPU] disaggregation_decode_enable_fake_auto parameter adaptation by @Estrella-xx in https://github.com/sgl-project/sglang/pull/17811
[NPU] support the Enable return routed experts by @jiashaokun-1 in https://github.com/sgl-project/sglang/pull/17025
[VLM] Optimize get_rope_index for GLM4v by @yuan-luo in https://github.com/sgl-project/sglang/pull/17420
Optimize custom-all-reduce by @yuan-luo in https://github.com/sgl-project/sglang/pull/17674
[BUGFIX]: using language-only should not reserve space for the vision encoder by @koush in https://github.com/sgl-project/sglang/pull/18011
[TestFix] rewrite LoRA overlap loading tests by @glenliu21 in https://github.com/sgl-project/sglang/pull/18047
[Fix] Remove no use code in MiMo-V2-Flash by @yuan-luo in https://github.com/sgl-project/sglang/pull/18051
Fix: Avoid Double Reduce in VLM DP Attention by @yhyang201 in https://github.com/sgl-project/sglang/pull/17991
[diffusion] cli: introduce generic attention backend configuration in ServerArgs by @mickqian in https://github.com/sgl-project/sglang/pull/18036
[diffusion] fix: fix missing component names for VAELoader by @mickqian in https://github.com/sgl-project/sglang/pull/18069
Add bootstrap_room validation to detect metadata corruption in PD disaggregation by @Simon-Li in https://github.com/sgl-project/sglang/pull/17430
[AMD] Fix aiter version in rocm image by @yctseng0211 in https://github.com/sgl-project/sglang/pull/18076
Refine logprob logic for request handling by @ch-wan in https://github.com/sgl-project/sglang/pull/17986
[EPD][refactor]: introduce BaseMMReceiver for gRPC transport integration by @liusy58 in https://github.com/sgl-project/sglang/pull/17921
Improve Per Commit Test job filtering for sglang-kernel by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18054
fix: zmq_to_tokenizer encoder transfer when host listens to 0.0.0.0 by @RangerCD in https://github.com/sgl-project/sglang/pull/17929
fix: correct weight loading prefix mapping for Qwen3-VL by @Lollipop in https://github.com/sgl-project/sglang/pull/18024
[diffusion] CI: deprecate WarmupRunner in CI by @yingluosanqian in https://github.com/sgl-project/sglang/pull/18038
[AMD] enable MoRI to release and nightly builds by @HaiShaw in https://github.com/sgl-project/sglang/pull/18101
[Diffusion] Fix Ring Parallel bug with FA4 by @BBuf in https://github.com/sgl-project/sglang/pull/18062
support smem in per_token_quant_fp8 kernel by @zhangxin81 in https://github.com/sgl-project/sglang/pull/16725
[Diffusion] remove accelerate dependency for device mapping by @RubiaCx in https://github.com/sgl-project/sglang/pull/18026
[NPU]mindspore model support moe by @zzzzzzzxh in https://github.com/sgl-project/sglang/pull/15363
docs: move deepseek_ocr to popular model usage and add cookbook reference by @sglang-bot in https://github.com/sgl-project/sglang/pull/18120
[NPU] update nightly tests by @Sugar920 in https://github.com/sgl-project/sglang/pull/17952
add Step-3.5-Flash model support by @yhyang201 in https://github.com/sgl-project/sglang/pull/18084
[DeepSeek V3.2] [Bugfix] slice indexer and padding fa3 when can not run cuda graph by @xu-yfei in https://github.com/sgl-project/sglang/pull/17076
[NPU] support dsv32 radixcache on ascend by @khalil2ji3mp6 in https://github.com/sgl-project/sglang/pull/17964
[MiMoV2Flash] [feat]: support two batch overlap by @TZHelloWorld in https://github.com/sgl-project/sglang/pull/17634
[Fix] data race in req_to_token pool by @cctry in https://github.com/sgl-project/sglang/pull/17850
Re-enable test_mla_int8_deepseek_v3.py after HF token fix by @alisonshao in https://github.com/sgl-project/sglang/pull/18123
feature: adding gpt-oss 120b nightly test by @dougyster in https://github.com/sgl-project/sglang/pull/18134
[Performance] Optimize radix cache eviction performance by @YiXR in https://github.com/sgl-project/sglang/pull/14339
[Diffsuion & JIT_kernel] QKNorm cross heads kernel by @BBuf in https://github.com/sgl-project/sglang/pull/18073
[HiCache]: Support DeepSeek v32 cpu offloading by @hzh0425 in https://github.com/sgl-project/sglang/pull/17415
[diffusion] UX: improve logging by @mickqian in https://github.com/sgl-project/sglang/pull/18122
[Move sgl-kernel Kernel to JIT] Add JIT concat MLA kernels by @celve in https://github.com/sgl-project/sglang/pull/17889
Add triton_fused_moe config for GLM-4.7-FP8 tp8 H20 H20-3e by @HanHan009527 in https://github.com/sgl-project/sglang/pull/18091
[Diffusion] fix serving image_edit get input image bug by @BBuf in https://github.com/sgl-project/sglang/pull/18109
MoE Refactor: Refactor modelopt_quant.py -> flashinfer_trllm.py by @b8zhong in https://github.com/sgl-project/sglang/pull/16685
Support Markdown/Notebook-Friendly Documentation Export for Downstream Integration by @klhhhhh in https://github.com/sgl-project/sglang/pull/18131
[TestFix] use unit tests for LoRA overlap loading tests by @glenliu21 in https://github.com/sgl-project/sglang/pull/18140
[NVIDIA] Add --top-k argument to run_eval.py by @kaixih in https://github.com/sgl-project/sglang/pull/18025
Gigachat 3 tool parser and tests by @ajpqs in https://github.com/sgl-project/sglang/pull/14765
[HiCache] fix: apply extra_backend_tag in Mooncake batch_exists by @00fish0 in https://github.com/sgl-project/sglang/pull/17265
[Perf] Use safetensors load_file in multithread loader by @mmangkad in https://github.com/sgl-project/sglang/pull/18124
[Docker] Remove hardcoded America/Los_Angeles timezone, default to UTC by @mmangkad in https://github.com/sgl-project/sglang/pull/18121
AMD PD/D PR ci by @Lzy17 in https://github.com/sgl-project/sglang/pull/17183
Warmup before profiling prefill latency for dynamic chunk sizing by @xiaoweiw-nv in https://github.com/sgl-project/sglang/pull/17198
[PD] feat: support mooncake intra-node nvlink kv transfer by @TTThanos in https://github.com/sgl-project/sglang/pull/17866
[Bugfix] Fix Mistral Large 3 NVFP4 TRTLLM MoE by @elvischenv in https://github.com/sgl-project/sglang/pull/18065
fix: add cu13 dev container to our release by @ishandhanani in https://github.com/sgl-project/sglang/pull/18192
Revert broken sgl_kernel exclusion patterns in paths-filter by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18193
enable ut test for xpu devices by @DiweiSun in https://github.com/sgl-project/sglang/pull/11712
[HiCache] feat: Add detailed cache hit breakdown for HiCache in sglext and Prometheus metrics by @vladnosiv in https://github.com/sgl-project/sglang/pull/17648
[Diffusion] Only import sgl_kernel in custom op cuda path (SiluAndMul and RMSNorm) by @yeahdongcn in https://github.com/sgl-project/sglang/pull/15592
[diffusion] hardware: support diffusion models on MTGPU (multi-GPU, 5/N) by @yeahdongcn in https://github.com/sgl-project/sglang/pull/17318
[diffusion] hardware: support diffusion models on MTGPU (doc, 6/N) by @yeahdongcn in https://github.com/sgl-project/sglang/pull/17346
add streaming parallel tool call test case by @harvenstar in https://github.com/sgl-project/sglang/pull/18097
Update weight rename check for Qwen3 Embeddings by @satyamk7054 in https://github.com/sgl-project/sglang/pull/17535
fix: ensuring nightly whls are tagged with latest commit by @dougyster in https://github.com/sgl-project/sglang/pull/18204
[diffusion] fix server cache-dit bug under continuous dynamic requests by @nono-Sang in https://github.com/sgl-project/sglang/pull/17140
[Docs] fix readme typo by @kuafou in https://github.com/sgl-project/sglang/pull/18207
Fix Session for multimodal and expose it through Engine by @aurickq in https://github.com/sgl-project/sglang/pull/18152
fuse qkvbfg linear into one gemm and f_b g_b into batched gemm. by @strgrb in https://github.com/sgl-project/sglang/pull/17801
fix: bumping nightly whl version by @dougyster in https://github.com/sgl-project/sglang/pull/18212
Support Markdown/Notebook-Friendly Documentation Export for Downstream Integration (copy all markdown and rst files) by @klhhhhh in https://github.com/sgl-project/sglang/pull/18223
[diffusion] kernel fusion: gated residual layernorm scale shift and layernorm scale shift kernel fusion for Qwen-Image, WAN and HunyuanVideo by @jianyingzhu in https://github.com/sgl-project/sglang/pull/14717
[DeepGemm] Add a flag for fast warmup by @Fridge003 in https://github.com/sgl-project/sglang/pull/18111
[RadixTree][5/N Refactor]: Introduce pre- and post-processing methods for key matching by @hzh0425 in https://github.com/sgl-project/sglang/pull/18147
Moving _alloc_extend_naive out of npu allocator by @ch-wan in https://github.com/sgl-project/sglang/pull/18200
[Diffusion] update code owner by @BBuf in https://github.com/sgl-project/sglang/pull/18247
[Diffusion] Downgrade prompt log from info to debug. by @Evrard-Nil in https://github.com/sgl-project/sglang/pull/17813
Make sure we always disable symm memory without dp padding by @nvcastet in https://github.com/sgl-project/sglang/pull/18129
optimize get_topk_ragged by fusing get k and k_scale triton kernel by @BJWang-ant in https://github.com/sgl-project/sglang/pull/16043
[diffusion][mova] clean codes by @CloudRipple in https://github.com/sgl-project/sglang/pull/18107
[diffusion] fix the bug of redundant memory usage on GPU-0 by @nono-Sang in https://github.com/sgl-project/sglang/pull/18221
Support passing spaces_between_special_tokens per request by @RunningLeon in https://github.com/sgl-project/sglang/pull/17939
support interns1-pro by @RunningLeon in https://github.com/sgl-project/sglang/pull/18145
[diffusion] refactor: move model_stages into stages folder by @mickqian in https://github.com/sgl-project/sglang/pull/18248
[AMD] Add kimi mi35x nightly test, folder organization and several stability fixes by @michaelzhang-ai in https://github.com/sgl-project/sglang/pull/17895
fix: fix MockModelRunner in attention tests by @zack041 in https://github.com/sgl-project/sglang/pull/18240
Add MoE fused config for Qwen3-Coder-Next-FP8 on H100 TP=2 by @mmangkad in https://github.com/sgl-project/sglang/pull/18195
[Bugfix] fix a obvious logic error by @lingebeng in https://github.com/sgl-project/sglang/pull/18254
fix: add SGLANG_IS_IN_CI env var to release-docs workflow by @zwang86 in https://github.com/sgl-project/sglang/pull/18225
fix kimi k2.5's moe gemm config init by @cicirori in https://github.com/sgl-project/sglang/pull/18064
[diffusion] chore: forbid Chinese characters by @mickqian in https://github.com/sgl-project/sglang/pull/18249
[PD] improve kv offset calculation for MHA model with different tp size by @Ch3ngY1 in https://github.com/sgl-project/sglang/pull/18163
[PD] doc: Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL and supported values by @stmatengss in https://github.com/sgl-project/sglang/pull/18259
[docs] fix misspellings & typos by @app/ in https://github.com/sgl-project/sglang/pull/18276
Support Markdown/Notebook-Friendly Documentation Export for Downstream Integration(convert rat files to md files and save) by @klhhhhh in https://github.com/sgl-project/sglang/pull/18278
Fix test_return_routed_experts to use response-level sglext by @alisonshao in https://github.com/sgl-project/sglang/pull/18274
[Diffusion] Support layerwise offload for mova by @BBuf in https://github.com/sgl-project/sglang/pull/18272
[XPU] Integrate MoE and minor improvements in XPU attention backend by @airMeng in https://github.com/sgl-project/sglang/pull/13561
[FIX] Always support TP > 4 for FP4 Gemm by @danielafrimi in https://github.com/sgl-project/sglang/pull/17300
[piecewise graph]: support MiniMax-M2 by @hzh0425 in https://github.com/sgl-project/sglang/pull/18217
[PD] Minor code cleanup for mooncake backend by @ShangmingCai in https://github.com/sgl-project/sglang/pull/18279
docker: add patch to increase GPU deepep timeout by @ishandhanani in https://github.com/sgl-project/sglang/pull/18298
[diffusion][hot fix] fix accuracy bug caused by PR 14717 by @yingluosanqian in https://github.com/sgl-project/sglang/pull/18296
[Kernel] Add JIT apply_rope_with_cos_sin_cache_inplace by @pansicheng in https://github.com/sgl-project/sglang/pull/18155
throw error if got adapter with added_tokens by @glenliu21 in https://github.com/sgl-project/sglang/pull/18046
[diffusion] feat: allow T5's TP Group to reuse the transformer's SP Group by @nono-Sang in https://github.com/sgl-project/sglang/pull/17818
NixlKVManager optimizations by @ovidiusm in https://github.com/sgl-project/sglang/pull/17654
Fix flaky test_frequency_penalty_reduces_word_repetition by using deterministic seeds by @alisonshao in https://github.com/sgl-project/sglang/pull/18285
Refactor(qwen3-vl) optimize position encoding interpolation by @aaaandychen in https://github.com/sgl-project/sglang/pull/16781
[Doc] refine spec decode docs for SpecV2/STANDALONE/NGRAM by @alphabetc1 in https://github.com/sgl-project/sglang/pull/18321
[Doc] add a summary section for spec decode document by @alphabetc1 in https://github.com/sgl-project/sglang/pull/18323
[Kernel] Migrate GPTQ-Marlin GEMM kernel to JIT by @celve in https://github.com/sgl-project/sglang/pull/18067
fix npu best practice by @amote-i in https://github.com/sgl-project/sglang/pull/18330
[diffusion][hot fix] fix torch.compile graph break caused by torch._dynamo.disable by @yingluosanqian in https://github.com/sgl-project/sglang/pull/18336
add hicache jit test by @XucSh in https://github.com/sgl-project/sglang/pull/17847
[diffusion] fix: offload text encoder model in image encoding stage by @xiaoyewww in https://github.com/sgl-project/sglang/pull/18317
Add Nemotron 3 Nano tests by @shaharmor98 in https://github.com/sgl-project/sglang/pull/18119
Add CI permission for Shunkangz, dongjiyingdjy, samuellees by @Fridge003 in https://github.com/sgl-project/sglang/pull/18377
[Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage by @pokymono in https://github.com/sgl-project/sglang/pull/17888
add hybrid model PD to NIXL connector by @nealvaidya in https://github.com/sgl-project/sglang/pull/16229
Merge stage-c-test-large-4-gpu suites into partitioned suites by @alisonshao in https://github.com/sgl-project/sglang/pull/18325
Revert "[Build] Enable full kernel in aarch64 wheel" by @Fridge003 in https://github.com/sgl-project/sglang/pull/18385
[Qwen3Next] Optimize fused_sigmoid_gating_delta_rule_update_kernel by @hlu1 in https://github.com/sgl-project/sglang/pull/18271
Support execute_shell_command for env var support by @zhaochenyang20 in https://github.com/sgl-project/sglang/pull/18390
[NPU] update npu doc by @Hexq0210 in https://github.com/sgl-project/sglang/pull/18344
[Diffusion] Apply fused_norm_scale_shift to MOVA by @BBuf in https://github.com/sgl-project/sglang/pull/18257
[Doc] Update CUDA 13 install guide to install torch first by @mmangkad in https://github.com/sgl-project/sglang/pull/18404
Remove unnecessary norm_type argument from GLM-Image dits by @haojin2 in https://github.com/sgl-project/sglang/pull/18382
[Doc] Fix outdated --fp4-gemm-backend documentation by @mmangkad in https://github.com/sgl-project/sglang/pull/18350
[diffusion] fix: respect dist_timeout option by @mickqian in https://github.com/sgl-project/sglang/pull/18386
[diffusion] feat: support saving videos directly on the server to avoid the overhead of tensor transfer by @nono-Sang in https://github.com/sgl-project/sglang/pull/18253
[Kimi-K2.5] Fix NVFP4 Kimi-K2.5 weight mapping and exclude list by @mmangkad in https://github.com/sgl-project/sglang/pull/18370
[NPU][diffusion] model: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend by @Makcum888e in https://github.com/sgl-project/sglang/pull/13662
[Fix] Fix backend selection after flashinfer version update by @DarkSharpness in https://github.com/sgl-project/sglang/pull/18364
fix: sync server_args.kv_cache_dtype when detecting FP8 KV cache by @zack041 in https://github.com/sgl-project/sglang/pull/18394
[diffusion] feat: support efficient sequence shard by @nono-Sang in https://github.com/sgl-project/sglang/pull/18161
[ModelOpt] Fix broken Qwen3-235B-A22B-Instruct-2507-NVFP4 launch by @vincentzed in https://github.com/sgl-project/sglang/pull/18189
[diffusion] refactor: group component loaders under the component_loaders/ directory by @mickqian in https://github.com/sgl-project/sglang/pull/18438
Fix TRT-LLM MLA backend applying k_scale to BF16 KV cache in BMM1 by @debo3 in https://github.com/sgl-project/sglang/pull/18396
[diffusion] chore: revise process title by @mickqian in https://github.com/sgl-project/sglang/pull/18446
Add tensor parallelism support to LFM2 ShortConv layers by @tugot17 in https://github.com/sgl-project/sglang/pull/17777
[Kimi-K2.5] Fix missing quant_config in KimiK25 by @mmangkad in https://github.com/sgl-project/sglang/pull/18440
Update author information in pyproject.toml by @merrymercy in https://github.com/sgl-project/sglang/pull/18453
[ModelOPT] Support Qwen 3 Next Coder NVFP4 by @vincentzed in https://github.com/sgl-project/sglang/pull/18224
Refactoring Mooncake TE as a shared distributed component by @ShangmingCai in https://github.com/sgl-project/sglang/pull/17810
[BugFix][PD]Fix metadata_buffer_index leak when aborted in PD by @ZhengWG in https://github.com/sgl-project/sglang/pull/17483
fix: fix the wrong return value type of draft model runner by @acelyc111 in https://github.com/sgl-project/sglang/pull/18105
fix: use --no-build-isolation for human-eval install by @harvenstar in https://github.com/sgl-project/sglang/pull/18455
[AMD] Update aiter to v0.1.10.post2 by @bingxche in https://github.com/sgl-project/sglang/pull/18423
[AMD] CI - Fix AMD daily image release and install dependency by @yctseng0211 in https://github.com/sgl-project/sglang/pull/18452
[DLLM] Add JointThreshold algorithm for joint M2T and T2T decoding by @edwardzjl in https://github.com/sgl-project/sglang/pull/18171
Make compressed-tensors MoEs support ignored layers by @LHXuuu in https://github.com/sgl-project/sglang/pull/17828
Revert "optimize get_topk_ragged by fusing get k and k_scale triton kernel" by @Fridge003 in https://github.com/sgl-project/sglang/pull/18471
feat: Add ModelScope support for multimodal_gen models by @yrk111222 in https://github.com/sgl-project/sglang/pull/17924
[diffusion] chore: fix unclean shutdown and resource leaks by @mickqian in https://github.com/sgl-project/sglang/pull/18477
[Feature] Support bidirectional attention for Gemma-3 by @zzhbrr in https://github.com/sgl-project/sglang/pull/10707
Pass quantize_config to _initialize_model by @klshuster in https://github.com/sgl-project/sglang/pull/18273
Fix MMLU benchmark to auto-download data and resolve path issue by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/18486
[MODEL] Adding Support for Qwen3.5 Models by @zju-stu-lizheng in https://github.com/sgl-project/sglang/pull/18489
[AMD] add amd ci monitor by @bingxche in https://github.com/sgl-project/sglang/pull/17476
feat(kv-events): Add medium field to KV event types for storage tier tracking by @ishandhanani in https://github.com/sgl-project/sglang/pull/18205
docs: expand and update modelopt documentation by @zack041 in https://github.com/sgl-project/sglang/pull/18479
Add cache_config_info metric. by @kartikx in https://github.com/sgl-project/sglang/pull/17273
[HiCache][PP] add test case for compatibility by @stmatengss in https://github.com/sgl-project/sglang/pull/16395
Fix idle batch predict dtype in spec v2 by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/18379
Make bench_one_batch_server compatible for more backends by @maocheng23 in https://github.com/sgl-project/sglang/pull/18512
[EPD] Add notification mechanism to fix server hang and add timeout env var by @liusy58 in https://github.com/sgl-project/sglang/pull/18229
Deepseekv32 compatibility with transformers v5 by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/18297
Support GlmMoeDsaForCausalLM by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/18521
[AMD] Turn on aiter-prebuild by @yctseng0211 in https://github.com/sgl-project/sglang/pull/18425
[HiCache] fix: StorageMetricsCollector was initialized twice by @alphabetc1 in https://github.com/sgl-project/sglang/pull/18354
[DLLM] Basic dLLM scheduling strategy and implementation by @ClawSeven in https://github.com/sgl-project/sglang/pull/17484
[diffusion] feat: support parallel wan-vae decode by @nono-Sang in https://github.com/sgl-project/sglang/pull/18179
[NPU] [CI] Enable run multimodal NPU CI when changes only in multimodal_gen by @Makcum888e in https://github.com/sgl-project/sglang/pull/18523
[diffusion] fix: fix fsdp by @mickqian in https://github.com/sgl-project/sglang/pull/18187
[sgl-kernel] upgrade deepgemm by @BBuf in https://github.com/sgl-project/sglang/pull/18362
[NPU][docs] improve docs for Best Practice on Ascend NPU by @husf1130 in https://github.com/sgl-project/sglang/pull/18360
[NPU] update npu doc by @Hexq0210 in https://github.com/sgl-project/sglang/pull/18474
fix(config): Support setting Mamba state dtype via config file by @zju-stu-lizheng in https://github.com/sgl-project/sglang/pull/18532
[NPU][docs]fix bug about hyperlink for best practice for ascend npu by @husf1130 in https://github.com/sgl-project/sglang/pull/18561
Revert "[sgl-kernel] upgrade deepgemm" by @Fridge003 in https://github.com/sgl-project/sglang/pull/18562
Tilelang sparse decode fwd for dsv32 mi355 by @1am9trash in https://github.com/sgl-project/sglang/pull/18488
Fix radix cache key to include generated tokens in multi-turn (regression) by @ycchen-tw in https://github.com/sgl-project/sglang/pull/16521
Fix wrong prefill log. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18570
[Doc] Comprehensive Guide: Navigating DP, DPA, and SMG Best Practices by @zhaohaidao in https://github.com/sgl-project/sglang/pull/18096
Enhance SMG guide with RL rollout systems benefits by @zhaochenyang20 in https://github.com/sgl-project/sglang/pull/18588
Add cache hit rate UT by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18566
[AMD] Fix Janus-Pro crash and add Kimi-K2.5 nightly test by @michaelzhang-ai in https://github.com/sgl-project/sglang/pull/18269
Fix Bug on dsv3.2 by @BourneSun0527 in https://github.com/sgl-project/sglang/pull/18553
Fp8 prefill attn kernel integration by @1am9trash in https://github.com/sgl-project/sglang/pull/18528
Register cp-atten-allgather buffers with symm memory by @wangfakang in https://github.com/sgl-project/sglang/pull/17756
[NPU] support model skywork-reward-gemma2-2-27B-v0.2 by @McZyWu in https://github.com/sgl-project/sglang/pull/16947
[V3.2] Change default CP token split method to --round-robin-split by @Fridge003 in https://github.com/sgl-project/sglang/pull/18613
add support to enable lora with embedding models by @vedantjh2 in https://github.com/sgl-project/sglang/pull/17780
Fix prefill stats for dllm by @ispobock in https://github.com/sgl-project/sglang/pull/18632
Add LMF2 MoE model architecture by @tugot17 in https://github.com/sgl-project/sglang/pull/17997
Clean up noisy startup log messages and refactor loader.py by @merrymercy in https://github.com/sgl-project/sglang/pull/18531
[diffusion] docs: consolidate diffusion documentation into docs by @qianyue76 in https://github.com/sgl-project/sglang/pull/18095
[PCG] GPT OSS Triton Kernel Support by @Oasis-Git in https://github.com/sgl-project/sglang/pull/18405
[Bugfix] fix config bug caused by PR [#18273] by @1195343015 in https://github.com/sgl-project/sglang/pull/18535
Avoid kimi linear stream sync by @vincentzed in https://github.com/sgl-project/sglang/pull/16186
Add CI permission for Chen-0210 by @Chen-0210 in https://github.com/sgl-project/sglang/pull/18494
glm5 md by @liupeng374 in https://github.com/sgl-project/sglang/pull/18655
[diffusion] fix: webui cannot correctly display generated video using wan2.2 by @yeahdongcn in https://github.com/sgl-project/sglang/pull/18473
List more CI runs for pr-test by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18650
update glm5 readme on npu by @xiaobaicxy in https://github.com/sgl-project/sglang/pull/18657
fix the max-parallel for /rerun-stage by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18658
Update modelopt quantization config parsing by @Edwardf0t1 in https://github.com/sgl-project/sglang/pull/13919
[Mamba] Add float16 support for SSM cache dtype by @danielafrimi in https://github.com/sgl-project/sglang/pull/18444
Try fix the max-parallel for maunally triggered test again. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18686
Update ci permission by @ispobock in https://github.com/sgl-project/sglang/pull/18693
[AMD] rocm 7.2 image release, PR test, Nightly Test by @yctseng0211 in https://github.com/sgl-project/sglang/pull/17799
[Flashinfer Autotune] Fix FlashInfer FP4 MoE autotuning crash by removing incorrect flatten on hidden_states_scale by @YAMY1234 in https://github.com/sgl-project/sglang/pull/18500
[Qwen3_5] Refactor Qwen3_5ForCausalLMMTP class implementation by @zju-stu-lizheng in https://github.com/sgl-project/sglang/pull/18538
Update README commands to include model-path option by @wplf in https://github.com/sgl-project/sglang/pull/18557
fix: /metrics endpoint always reports engine_type="unified" in PD disaggregation mode by @2JooYeon in https://github.com/sgl-project/sglang/pull/18552
[Z-Image] Replace TextEncoderConfig with Qwen3TextConfig by @rootonchair in https://github.com/sgl-project/sglang/pull/18560
[AMD] Enable release image build for ROCm 7.2.0 by @akao-amd in https://github.com/sgl-project/sglang/pull/18698
[Ascend]Support qwen3.5 by @chenxu214 in https://github.com/sgl-project/sglang/pull/18544
[AMD] reset AMD image release time and reduce CI queue time by @yctseng0211 in https://github.com/sgl-project/sglang/pull/18707
[AMD] Fix accuracy issue when running TP4 dsv3 model with mtp by @1am9trash in https://github.com/sgl-project/sglang/pull/18607
add tool_choice=auto nightly test case by @harvenstar in https://github.com/sgl-project/sglang/pull/18302
Make PR based docker and pypi workflow work for forked PR by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18720
Fix flaky penalty tests by using higher temperature for effect comparison by @alisonshao in https://github.com/sgl-project/sglang/pull/18380
Add spec_accept_histogram request statistic by @scottjlee in https://github.com/sgl-project/sglang/pull/18332
refactor: replace local proto compilation with smg-grpc-proto package by @slin1237 in https://github.com/sgl-project/sglang/pull/18682
[BUGFIX] fix bug in handle mamba radix cache in server_args by @yizhang2077 in https://github.com/sgl-project/sglang/pull/18723
Fix B200 installation issue by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18725
refactor: consolidate gRPC client into shared crate dependency by @slin1237 in https://github.com/sgl-project/sglang/pull/18730
speed up sgl-kernel build by @BBuf in https://github.com/sgl-project/sglang/pull/18586
fix: image version in pypi pr workflow by @dougyster in https://github.com/sgl-project/sglang/pull/18735
refactor: remove crate re-export aliases from lib.rs by @slin1237 in https://github.com/sgl-project/sglang/pull/18737
Reuse initialized transfer engine in mooncake store by @ShangmingCai in https://github.com/sgl-project/sglang/pull/18460
Build ROCm7.2 Image with latest AITER v0.1.10.post3 by @HaiShaw in https://github.com/sgl-project/sglang/pull/18741
[Diffusion] [BUG] Fix missing initialization of GLM-Image text encoder config by @haojin2 in https://github.com/sgl-project/sglang/pull/18704
Fix invalid import paths in glm_image.py by @alisonshao in https://github.com/sgl-project/sglang/pull/18757
Revert changes to weight_utils.py by @merrymercy in https://github.com/sgl-project/sglang/pull/18759
feat: support release lookup by @alphabetc1 in https://github.com/sgl-project/sglang/pull/18450
Modify glm5 readme on npu by @BourneSun0527 in https://github.com/sgl-project/sglang/pull/18768
[AMD] Fix Multimodal Test 1 GPU by @bingxche in https://github.com/sgl-project/sglang/pull/18716
[diffusion]Allows quality adjustment of generated images/videos through requests. by @IPostYellow in https://github.com/sgl-project/sglang/pull/17937
[BUG] fixed local model loading issue in multimodal generation test by @blazingbhavneek in https://github.com/sgl-project/sglang/pull/18687
[Kernel] Add JIT rotary_embedding_kernel by @pansicheng in https://github.com/sgl-project/sglang/pull/17934
[Spec] Move forward timeout before verify to fix Eagle v1 filter mismatch by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18760
[diffusion] feat: support tp for qwen-image-edit-2511 by @xiaoyewww in https://github.com/sgl-project/sglang/pull/18619
Rename request timeout env vars for waiting/running stages by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18766
[Bugfix] Add warnings when NSA indexer cache indice mismatch in PD module by @ShangmingCai in https://github.com/sgl-project/sglang/pull/18727
Support LingV2_5 model by @ant-yy in https://github.com/sgl-project/sglang/pull/18598
[diffusion] feat: support SparseVideoGen2 attention backend by @tie-pilot-qxw in https://github.com/sgl-project/sglang/pull/17507
[schedule] Fix streaming return of customized_info by @yinghai in https://github.com/sgl-project/sglang/pull/18654
Cleanup unused rerun stages by @ispobock in https://github.com/sgl-project/sglang/pull/18788
Adjust mamba cache allocation by @ispobock in https://github.com/sgl-project/sglang/pull/18786
Enhence gsm8k test by @ispobock in https://github.com/sgl-project/sglang/pull/18791
Cleanup debug log for Ring model by @ispobock in https://github.com/sgl-project/sglang/pull/18793
Added cuda availability guard by @mattteochen in https://github.com/sgl-project/sglang/pull/18480
[diffusion] refactor: merge redundant default_dtype and param_dtype parameters in FSDP loader by @mickqian in https://github.com/sgl-project/sglang/pull/18789
[diffusion] fix: webui task_type check by @yeahdongcn in https://github.com/sgl-project/sglang/pull/18462
[diffusion] fix typo by @triple-mu in https://github.com/sgl-project/sglang/pull/18790
[diffusion] chore: use batched P2P ops in VAE parallel decoding by @mickqian in https://github.com/sgl-project/sglang/pull/18728
[Kernel Slimming] Migrate GPTQ-Marlin repack kernel to JIT by @celve in https://github.com/sgl-project/sglang/pull/18543
refactor context parallel state by @dongjiyingdjy in https://github.com/sgl-project/sglang/pull/17213
[bugfix] fix mamba slot leak when scheduling fails with radix cache (#15840) by @kuafou in https://github.com/sgl-project/sglang/pull/16067
fix double-free kv cache for requests that have already finished and been freed during preemption by @JD-ETH in https://github.com/sgl-project/sglang/pull/18694
Update notified user in post_ci_failures_to_slack.py by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18817
[FlashInfer] Bump FlashInfer version from 0.6.2 to 0.6.3 by @mmangkad in https://github.com/sgl-project/sglang/pull/18448
Update performance dashboard for nightly tests by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18824
[Perf] refactor piecewise cuda graph support of Qwen3-Next by @zminglei in https://github.com/sgl-project/sglang/pull/17613
feat: add SGLANG_DISTRIBUTED_INIT_METHOD_OVERRIDE env var by @YazhiGao in https://github.com/sgl-project/sglang/pull/18743
[PD-Disagg] Fix double free when prebuilt batch is aborted. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18822
Add timeout abort kits for normal / eagle. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18815
[Env] centralize hicache vars in environ.py by @alphabetc1 in https://github.com/sgl-project/sglang/pull/17204
Handle abort for retracted requests in disagg decode prealloc queue by @qmzznbxhl in https://github.com/sgl-project/sglang/pull/18705
[diffusion][MUSA] fix: MUSA platform breakage caused by PR [#13662] by @yeahdongcn in https://github.com/sgl-project/sglang/pull/18456
Fix/partial gen from waiting queue miss metadata by @JD-ETH in https://github.com/sgl-project/sglang/pull/17610
[VLM][LLM] Optimize fused_moe triton kernel tma by @yuan-luo in https://github.com/sgl-project/sglang/pull/18782
[AMD] Fix sgl-model-gateway Build Errors in ROCm Docker Release by @bingxche in https://github.com/sgl-project/sglang/pull/18836
Kernel: optimize decoding metadata in NSA multi-spec backend with fused kernels by @Johnsonms in https://github.com/sgl-project/sglang/pull/17554
Fix dsv32 encode_messages by @whybeyoung in https://github.com/sgl-project/sglang/pull/18126
Add ci test for ring model by @ispobock in https://github.com/sgl-project/sglang/pull/18829
feat: Support mrope_section with rope_type: "yarn" by @raayandhar in https://github.com/sgl-project/sglang/pull/13313
Enable SGLANG_ENABLE_SPEC_V2 for nightly speculative decoding tests by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18719
[kernel slimming] Move fast_hadamard_transform to jit_kernel by @BBuf in https://github.com/sgl-project/sglang/pull/18475
[Diffusion] opt vae decode with channels_last_3d by @BBuf in https://github.com/sgl-project/sglang/pull/18540
Add CI permissions by @mmangkad in https://github.com/sgl-project/sglang/pull/18847
Fix model loading for DeepSeek-V3.2-AWQ by @bingps in https://github.com/sgl-project/sglang/pull/16907
[Doc] Convert the speculative decoding notebook to markdow by @alphabetc1 in https://github.com/sgl-project/sglang/pull/18395
Model: Support IBM Granite (Dense/Mamba + MoE) by @blazingbhavneek in https://github.com/sgl-project/sglang/pull/18040
[FIX] Correct JIT kernel compilation on newer GPUs with outdated driver metadata. by @muse-coder in https://github.com/sgl-project/sglang/pull/18496
[AMD] Fix/qwen3 5 amd rope cutedsl fallback by @andyluo7 in https://github.com/sgl-project/sglang/pull/18753
[Perf] Tune MiniMax M2 fused moe kernel on H100 GPU by @zhendonghua in https://github.com/sgl-project/sglang/pull/18851
perf: add minimax-2.5 fused_moe tuning config for h20 by @zhangxiaolei123456 in https://github.com/sgl-project/sglang/pull/18833
[diffusion]: Enable torch.compile for UlyssesAttention by @Ratish1 in https://github.com/sgl-project/sglang/pull/18840
fix bug on kimi2.5 when dp2 tp4 by @haowen-han in https://github.com/sgl-project/sglang/pull/18604
Extract dumper and prefill delayer tests common utils by @fzyzcjy in https://github.com/sgl-project/sglang/pull/18857
Add missing dumper tests by @fzyzcjy in https://github.com/sgl-project/sglang/pull/18859
[AMD] Fix nightly 1-GPU test failures and bench_serving regression by @michaelzhang-ai in https://github.com/sgl-project/sglang/pull/18761
[diffusion] quant: add support for svdquant and nunchaku by @mickqian in https://github.com/sgl-project/sglang/pull/18549
change npu.dockerfile by @chenxu214 in https://github.com/sgl-project/sglang/pull/18835
[diffusion]: Improve layerwise offload buffer reuse and shared-storage handling by @Ratish1 in https://github.com/sgl-project/sglang/pull/18611
feature: adding build commit to sgl kernel workflow by @dougyster in https://github.com/sgl-project/sglang/pull/18853
Enable DeepGemm fast warmup in CI to prevent cold-cache timeouts by @alisonshao in https://github.com/sgl-project/sglang/pull/18823
update pre-commit config by @SoluMilken in https://github.com/sgl-project/sglang/pull/18860
fix: update Blackwell log/error messages to include SM12x by @blake-snc in https://github.com/sgl-project/sglang/pull/18751
fix: add SM110 (Jetson AGX Thor) to Blackwell capability check by @WiwilZ in https://github.com/sgl-project/sglang/pull/18787
test: add test for Modelopt FP8 on SM90 by @zack041 in https://github.com/sgl-project/sglang/pull/18463
fix_get_quant_method_in_fused_moe_condition by @tom-zju in https://github.com/sgl-project/sglang/pull/18459
Use ephemeral nccl port via get_free_port() by @chanh in https://github.com/sgl-project/sglang/pull/18009
feat: expose consistent_hashing policy in Python router CLI args by @bledden in https://github.com/sgl-project/sglang/pull/17972
Improve profiler options for bench_serving by @akhilg-nv in https://github.com/sgl-project/sglang/pull/16991
Fix libnuma.so does not exsit by @QiuMike in https://github.com/sgl-project/sglang/pull/15355
fix(sgl-kernel): support CUDA 13 runtime preloading for DGX Spark by @blake-snc in https://github.com/sgl-project/sglang/pull/18747
fix(sgl-kernel): use >= 120 for SM12x CUDA kernel dispatch by @blake-snc in https://github.com/sgl-project/sglang/pull/18750
Create ascend_npu_qwen3_5_examples.md by @chenxu214 in https://github.com/sgl-project/sglang/pull/18864
Update ascend_npu_support.rst by @chenxu214 in https://github.com/sgl-project/sglang/pull/18868
Add claude skills for sgl-kernel and jit-kernel by @BBuf in https://github.com/sgl-project/sglang/pull/18855
Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 by @rainj-me in https://github.com/sgl-project/sglang/pull/18389
fix: nightly whl dev date suffix by @dougyster in https://github.com/sgl-project/sglang/pull/18873
[VLM] Optimize Ernie4.5-VL rotary embedding with fused triton kernel by @yuan-luo in https://github.com/sgl-project/sglang/pull/18856
[diffusion] fix: avoid saving output for warmup requests by @mickqian in https://github.com/sgl-project/sglang/pull/18867
[diffusion] refactor: refactor server_args adjust and validate logics by @mickqian in https://github.com/sgl-project/sglang/pull/18863
[Diff]: support SGLANG_TORCH_PROFILER_DIR environment variable for profiler log directory by @Johnsonms in https://github.com/sgl-project/sglang/pull/18454
[AMD] MORI-EP inter kernel type switch by @Duyi-Wang in https://github.com/sgl-project/sglang/pull/18437
Flip dumper to disable by default and refactor environment handling by @fzyzcjy in https://github.com/sgl-project/sglang/pull/18878
Change dump output format to dict with value and metadata by @fzyzcjy in https://github.com/sgl-project/sglang/pull/18879
Collect upper level metadata to dump output by @fzyzcjy in https://github.com/sgl-project/sglang/pull/18880
Support dumping gradients, parameters, lazy values by @fzyzcjy in https://github.com/sgl-project/sglang/pull/18881
fix: unifying docker image build pipeline by @dougyster in https://github.com/sgl-project/sglang/pull/18814
fix: adding performance logging for nightly diffusion by @dougyster in https://github.com/sgl-project/sglang/pull/18023
Fix test_lora_qwen3 nightly failure: replace adapter with added_tokens by @alisonshao in https://github.com/sgl-project/sglang/pull/18884
Update ascend_npu_qwen3_5_examples.md by @realray808 in https://github.com/sgl-project/sglang/pull/18888
[Diffusion] Fix LoRA weight snapshot aliasing in unmerge by @ChangyiYang in https://github.com/sgl-project/sglang/pull/18883
Fix GLM-4V processor registration when glm_ocr is unavailable by @alisonshao in https://github.com/sgl-project/sglang/pull/18885
[JIT kernel] hd=512,1024 in JIT QK norm (cta based) by @vincentzed in https://github.com/sgl-project/sglang/pull/17515
[diffusion] logging: improve peak vram logging by @mickqian in https://github.com/sgl-project/sglang/pull/18865
Revert "[diffusion]: Improve layerwise offload buffer reuse and shared-storage handling" by @mickqian in https://github.com/sgl-project/sglang/pull/18866
[Model] Add Qwen3ForRewardModel and fix Qwen3ForSequenceClassification by @shvmjndl in https://github.com/sgl-project/sglang/pull/17992
[Perf] ~9.5x faster Blackwell MXFP4 MoE weight loading by @mmangkad in https://github.com/sgl-project/sglang/pull/18858
[diffusion][Wan]: fix sparse attention backends being applied to cross-attention by @Ratish1 in https://github.com/sgl-project/sglang/pull/17596
refactor FAKE transfer backend and remove --disaggregation-decode-enable-fake-auto parameter by @Estrella-xx in https://github.com/sgl-project/sglang/pull/18345
[2/N] Quantization Refactor: Compressed tensors MoE schemes by @TamirBaydasov in https://github.com/sgl-project/sglang/pull/17503
Fix modelopt FP8 create weights by @danielafrimi in https://github.com/sgl-project/sglang/pull/18447
Fix GLM-5 fused shared expert by @FrankMinions in https://github.com/sgl-project/sglang/pull/18804
[diffusion]: fix scheduler crash on ZMQ messages with unexpected frame counts by @Ratish1 in https://github.com/sgl-project/sglang/pull/17890
Adapt the Qwen2Model._update_causal_mask for transformers==4.57.1 by @pansicheng in https://github.com/sgl-project/sglang/pull/18774
[diffusion] operator: unify rotary embedding impl by @triple-mu in https://github.com/sgl-project/sglang/pull/18164
[misc] adding metadata field in UpdateWeightFromDiskReqInput by @happierpig in https://github.com/sgl-project/sglang/pull/18821
Skip flaky test_tool_choice_required_non_streaming for Mistral by @alisonshao in https://github.com/sgl-project/sglang/pull/18889
[AMD] Fix RotaryEmbedding crash on AMD/ROCm (regression from [#17934]) by @michaelzhang-ai in https://github.com/sgl-project/sglang/pull/18903
[TBO] fix cuda graph intermittently becomes disabled bug by @billishyahao in https://github.com/sgl-project/sglang/pull/18320
[Diffusion] [NPU] [Doc] Add NPU documentation for sglang-diffusion by @Makcum888e in https://github.com/sgl-project/sglang/pull/18894
Revert "[AMD] Fix RotaryEmbedding crash on AMD/ROCm (regression from [#17934])" by @HaiShaw in https://github.com/sgl-project/sglang/pull/18922
[diffusion]: fix sparse video gen 2 backend being applied to cross-attention by @Ratish1 in https://github.com/sgl-project/sglang/pull/18900
[Diffusion] Fix get model name when model local path end with "/" by @Makcum888e in https://github.com/sgl-project/sglang/pull/18918
ROCm use rotary_embedding from sgl-kernel by @HaiShaw in https://github.com/sgl-project/sglang/pull/18920
[Diffusion] [NPU] Fix CI run by @Makcum888e in https://github.com/sgl-project/sglang/pull/18921
Revert "[diffusion] operator: unify rotary embedding impl" by @mickqian in https://github.com/sgl-project/sglang/pull/18929
[PCG] support piecewise cuda graph for kimi-linear model by @zminglei in https://github.com/sgl-project/sglang/pull/18849
[diffusion]: MOVA torch.compile opt by @triple-mu in https://github.com/sgl-project/sglang/pull/18914
[gRPC] Fix scheduler startup broken by context parallel refactor by @slin1237 in https://github.com/sgl-project/sglang/pull/18933
[diffusion] update code owner by @ping1jing2 in https://github.com/sgl-project/sglang/pull/18495
[3/N] Quantization Refactor: ModelSlim MoE schemes by @TamirBaydasov in https://github.com/sgl-project/sglang/pull/17993
fix(glm-image): single-GPU T5 config + SP support for 4D latents (#18… by @Nickcp39 in https://github.com/sgl-project/sglang/pull/18739
Fix generated-shared-prefix bench_serving by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/18769
Fix benchmark_sglang_fused_moe_triton.py by @satyamk7054 in https://github.com/sgl-project/sglang/pull/18940
cleanup prefill metrics logging to fix dp-attn metrics by @Ratish1 in https://github.com/sgl-project/sglang/pull/18778
feat: add cuda core dump CI warpper by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18909
Refactor sampler: Use a better hash function for deterministic sampling and clear dispatch for probs/logprobs/logits sampling paths by @merrymercy in https://github.com/sgl-project/sglang/pull/18915
Fix eval tests not capturing server launch failures by @alisonshao in https://github.com/sgl-project/sglang/pull/18886
Expose priority parameter in Engine.generate() and Engine.async_generate() by @PeaBrane in https://github.com/sgl-project/sglang/pull/18944
feat: [Qwen3.5] Support block-wise FP8 quantization and model adaptation by @zju-stu-lizheng in https://github.com/sgl-project/sglang/pull/18926
Revert "Fix generated-shared-prefix bench_serving" by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18956
feat: add nsa and swa disagg support with nixl by @nealvaidya in https://github.com/sgl-project/sglang/pull/18939
[feat] Add return_routed_experts param to async_generate for parity with generate by @Aphoh in https://github.com/sgl-project/sglang/pull/18508
[Refactor] Fix test and clean up hicache code by @DarkSharpness in https://github.com/sgl-project/sglang/pull/18555
[diffusion] refactor: unify SamplingParams construction and improve DiffGenerator return types by @mickqian in https://github.com/sgl-project/sglang/pull/18928
Reasoning models fix docs by @HaiShaw in https://github.com/sgl-project/sglang/pull/18963
Remove unused fast-hadamard-transform PyTorch extension sources by @BBuf in https://github.com/sgl-project/sglang/pull/18927
[Tiny fix] Super tiny fix mul_add naive forward bug by @BBuf in https://github.com/sgl-project/sglang/pull/18964
Enable fa3 PDL by compiling it with corresponding flags by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/18756
[AMD] ROCm7.2: Add /sgl-workspace/aiter to PYTHONPATH by @HaiShaw in https://github.com/sgl-project/sglang/pull/18972
[BUG] Refactor task resolution logic in benchmark function for multimodal generation by @zijiexia in https://github.com/sgl-project/sglang/pull/18948
Add DP ViT support for Kimi K2.5 by @yhyang201 in https://github.com/sgl-project/sglang/pull/18689
[Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need) by @yushengsu-thu in https://github.com/sgl-project/sglang/pull/18634
[4/N] Quantization Refactor: Quark MoE schemes by @TamirBaydasov in https://github.com/sgl-project/sglang/pull/18252
[Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … by @dreamyang-liu in https://github.com/sgl-project/sglang/pull/18306
[Doc] Add flashinfer_deepgemm to --fp8-gemm-backend by @mmangkad in https://github.com/sgl-project/sglang/pull/18982
Fix flaky Qwen3-Next KL divergence tests by reverting mamba slot release by @alisonshao in https://github.com/sgl-project/sglang/pull/18910
[AMD] Fix mi35x dsv32 mtp nightly by @bingxche in https://github.com/sgl-project/sglang/pull/18978
Add batched zero copy to NIXL backend by @hxieustc in https://github.com/sgl-project/sglang/pull/18850
[Qwen3.5] Enable nvfp4 checkpoint by @hlu1 in https://github.com/sgl-project/sglang/pull/18937
Fix PCG MoE Error by @Oasis-Git in https://github.com/sgl-project/sglang/pull/17739
Feat/add fi selective state update kernel call by @shaharmor98 in https://github.com/sgl-project/sglang/pull/18070
[RadixTree][4/N Refactor]: Move available_and_evictable_str to individual radix cache classes by @pansicheng in https://github.com/sgl-project/sglang/pull/17852
[Diffusion] Refactor diffusion triton kernels by @BBuf in https://github.com/sgl-project/sglang/pull/18966
[Fix] Fix rank used in parallel executor when enable_cfg_parallel is false by @Prozac614 in https://github.com/sgl-project/sglang/pull/18975
[Diffusion] [NPU] Enable profiler on NPU by @Makcum888e in https://github.com/sgl-project/sglang/pull/17807
Move lora request validation to tokenizer_manager from server by @satyamk7054 in https://github.com/sgl-project/sglang/pull/18962
[diffusion] chore: improve memory usage on consumer-level GPU by @mickqian in https://github.com/sgl-project/sglang/pull/18997
[diffusion] CI: enable warmup as default by @mickqian in https://github.com/sgl-project/sglang/pull/19010
Add SDAR model support by @chengshuang18 in https://github.com/sgl-project/sglang/pull/18318
[spec v2]Fix torch gc of future indices by @hnyls2002 in https://github.com/sgl-project/sglang/pull/18958
Revert "Add SDAR model support" by @ch-wan in https://github.com/sgl-project/sglang/pull/19032
Register tensors with symmetric memory for qwen by @nvcastet in https://github.com/sgl-project/sglang/pull/18643
Fix long prompt KV allocation by falling back to torch native APIs when exceeding Triton tensor limit by @ch-wan in https://github.com/sgl-project/sglang/pull/18250
Fix flashinfer autotune to only wrap run_once() by @ch-wan in https://github.com/sgl-project/sglang/pull/19004
Support cleanup previous dumps in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19013
Hint users when wrongly execute it with partial ranks in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19014
Support captured dump output and console output control in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19017
Support filtering labels in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19018
Enhance configure and env parsing in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19034
Support resetting and enhance HTTP endpoints for dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19046
Support using SGLang port in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19038
Feature/sdar support by @chengshuang18 in https://github.com/sgl-project/sglang/pull/19044
[Fix][Qwen3.5] Pass max_mamba_cache_size to mamba pool in disaggregation decode path by @YAMY1234 in https://github.com/sgl-project/sglang/pull/19002
[AMD] Replace msgpack with msgspec in MORI-IO by @Duyi-Wang in https://github.com/sgl-project/sglang/pull/19007
fix lint on main by @ch-wan in https://github.com/sgl-project/sglang/pull/19052
feature: docker patch workflow by @dougyster in https://github.com/sgl-project/sglang/pull/19025
Fix lint on main by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19054
[diffusion] logging: log available mem when each stage starts in debug level by @mickqian in https://github.com/sgl-project/sglang/pull/18998
[jit kernel] Support per_token_group_quant_8bit jit kernel by @yuan-luo in https://github.com/sgl-project/sglang/pull/18905
[diffusion] feat: support nunchaku for Z-Image-Turbo and flux.1 (int4) by @mickqian in https://github.com/sgl-project/sglang/pull/18959
Fix NSA FP8 KV cache path for both-trtllm MHA one-shot by @mmangkad in https://github.com/sgl-project/sglang/pull/18931
[Fix] DO NOT skip save_kv_cache for dllm by @DarkSharpness in https://github.com/sgl-project/sglang/pull/19020
[Fix] Run FlashInfer autotune on non-default stream for NCCL 2.29+ compatibility by @nvcastet in https://github.com/sgl-project/sglang/pull/18987
Fix adjust_num_token_non_padded_for_attn_tp returning CPU tensor by @ch-wan in https://github.com/sgl-project/sglang/pull/19051
[AMD] support two batch overlapping for mori ep by @billishyahao in https://github.com/sgl-project/sglang/pull/17953
[feat] feat: support swa in trtllm_mha by @LuYanFCP in https://github.com/sgl-project/sglang/pull/18970
Add generated-shared-prefix dataset in bench_one_batch by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/18986
[GPT-OSS] support fp8 online quantization for gpt-oss bf16 by @zminglei in https://github.com/sgl-project/sglang/pull/18988
Refactor graph input buffers by @ch-wan in https://github.com/sgl-project/sglang/pull/18991
[DSv32] Fix MTP and CP compatability by @vladnosiv in https://github.com/sgl-project/sglang/pull/19062
Fix bug in symm mem pre-allocation default by @nvcastet in https://github.com/sgl-project/sglang/pull/19082
Remove error dllm and diffusion doc in basic_useage by @zhaochenyang20 in https://github.com/sgl-project/sglang/pull/19105
[Quantization] Support config.json quantization_config format, fix exclude_modules matching, and fix KV cache scale loading for Nemotron by @danielafrimi in https://github.com/sgl-project/sglang/pull/18546
[diffusion] refactor: reduce redundancy and improve stage api by @mickqian in https://github.com/sgl-project/sglang/pull/19060
[FEAT] Add Anthropic compatible API endpoint by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/18630
[diffusion] feat: support passing component path via server args by @mickqian in https://github.com/sgl-project/sglang/pull/19108
[Feature] rewrite rope kernel; remove flashinfer dependencies by @DarkSharpness in https://github.com/sgl-project/sglang/pull/18844
[Diffusion] Restruct and clean Diffusion rotary embedding by @BBuf in https://github.com/sgl-project/sglang/pull/19064
fix tool handling in OpenAIServingChat by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/18996
fix KimiK2Detector regex patterns with re.DOTALL by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/19120
[sgl] view could hold the memory too long and introduced large memory by @bixue2010 in https://github.com/sgl-project/sglang/pull/19109
[FlashInfer] Switch FlashInfer allreduce fusion to unified API by @mmangkad in https://github.com/sgl-project/sglang/pull/18341
[Refactor] Benchmark Phase 1: extract utils and datasets from bench_serving by @Ratish1 in https://github.com/sgl-project/sglang/pull/19077
[Benchmark] Remove re-exports from bench_serving.py by @hnyls2002 in https://github.com/sgl-project/sglang/pull/19130
Revert "[jit kernel] Support per_token_group_quant_8bit jit kernel" by @hnyls2002 in https://github.com/sgl-project/sglang/pull/19131
Fix dev Docker build OOM on ARM64 cu13 by adding docker system prune by @Kangyan-Zhou in https://github.com/sgl-project/sglang/pull/18947
[Fix] Quick fix for int32 overflow in Mooncakes' send_kvcache_slice by @YAMY1234 in https://github.com/sgl-project/sglang/pull/19076
[diffusion] Adapt FP8 linear to sgld feature (Rebase) by @fy1214 in https://github.com/sgl-project/sglang/pull/17023
[BUG] [DLLM] Missing max_running_requests value by @blazingbhavneek in https://github.com/sgl-project/sglang/pull/18740
Fix spec v2+dp attention in nsa backend by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/19134
[Qwen3-Next] Enable fused_qkvzba_split_reshape_cat also for prefill by @YAMY1234 in https://github.com/sgl-project/sglang/pull/18917
[PD] Change bootstrap_room metadata dtype from int64 to uint64 by @ShangmingCai in https://github.com/sgl-project/sglang/pull/19141
Refactor dumper and change on_forward_pass_start API by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19065
Support non-intrusive dumping in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19068
Support enabling partial non intrusive dump in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19069
Auto annotate context in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19071
Extract framework plugins in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19072
Enhance hook mechanism in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19073
Configure and call dumper in main SGLang logic by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19093
Support multi colocated dumper, named exp cleanup, argparse config by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19094
Enhance reset, states, http in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19095
Fix wrongly large dumped file and handle non intrusive hook reset in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19124
[DSv32] [GLM5] Improve Model Quality by Avoiding FP32 Precision Loss in weights_proj by @zianglih in https://github.com/sgl-project/sglang/pull/19041
Support kwargs and megatron core tensor parsing in dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/19138
[diffusion] chore: minor cleanups by @mickqian in https://github.com/sgl-project/sglang/pull/19123
[diffusion] CI: relax perf check threshold by @mickqian in https://github.com/sgl-project/sglang/pull/19154
Fix corrupted JSONL metrics file due to concurrent writes by @talorabr in https://github.com/sgl-project/sglang/pull/19011
[diffusion] refactor: rename quantized model path server arg by @mickqian in https://github.com/sgl-project/sglang/pull/19142
Revert "[AMD] support two batch overlapping for mori ep [#17953]" by @Fridge003 in https://github.com/sgl-project/sglang/pull/19161
fix(diffusion): enforce strict input_reference validation for T2V by @Ratish1 in https://github.com/sgl-project/sglang/pull/14825
Revert "Refactor graph input buffers (#18991)" by @Fridge003 in https://github.com/sgl-project/sglang/pull/19173
Update rocm7.2 Dockerfile to install amdsmi for QuickReduce Initialization by @clintg6 in https://github.com/sgl-project/sglang/pull/19091
Fix bench_one_batch_server by moving the print statements by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/19175
[AMD] ENV Flags tuning and cleanup by @HaiShaw in https://github.com/sgl-project/sglang/pull/19176
[Diffusion] Detect Flux2 custom VAE path from component_paths by @ChangyiYang in https://github.com/sgl-project/sglang/pull/19170
[ROCm] Use unreg path for custom all-reduce during CUDA graph capture by @zyzshishui in https://github.com/sgl-project/sglang/pull/19162
Reorganize topk logic to clean up code and expose logical experts by @ocss884 in https://github.com/sgl-project/sglang/pull/16945
Use single mma warp group for short q_len in FA to optimize decoding performance by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/18985
[NPU] bump sgl-kernel-npu to 2026.02.01.post2 by @iforgetmyname in https://github.com/sgl-project/sglang/pull/19178
[Refactor] Split rotary_embedding.py into a modular package by @BBuf in https://github.com/sgl-project/sglang/pull/19144
[Diffusion] Match rotary_embedding module name style by @BBuf in https://github.com/sgl-project/sglang/pull/19179
[Kernel Slimming] Migrate AWQ marlin repack kernel to JIT by @celve in https://github.com/sgl-project/sglang/pull/18949
[PD-Disagg] Support query dp rank from bootstrap server. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/19168
add new ci user by @narutolhy in https://github.com/sgl-project/sglang/pull/19133

New Contributors

@00fish0 made their first contribution in https://github.com/sgl-project/sglang/pull/17265
@1195343015 made their first contribution in https://github.com/sgl-project/sglang/pull/18535
@1am9trash made their first contribution in https://github.com/sgl-project/sglang/pull/18488
@22dimensions made their first contribution in https://github.com/sgl-project/sglang/pull/18017
@2JooYeon made their first contribution in https://github.com/sgl-project/sglang/pull/18552
@Aphoh made their first contribution in https://github.com/sgl-project/sglang/pull/18508
@BJWang-ant made their first contribution in https://github.com/sgl-project/sglang/pull/16043
@BourneSun0527 made their first contribution in https://github.com/sgl-project/sglang/pull/18553
@Ch3ngY1 made their first contribution in https://github.com/sgl-project/sglang/pull/18163
@ChangyiYang made their first contribution in https://github.com/sgl-project/sglang/pull/18883
@CloudRipple made their first contribution in https://github.com/sgl-project/sglang/pull/17704
@DiweiSun made their first contribution in https://github.com/sgl-project/sglang/pull/11712
@DotSlash-A made their first contribution in https://github.com/sgl-project/sglang/pull/16969
@Duyi-Wang made their first contribution in https://github.com/sgl-project/sglang/pull/18437
@EduardDurech made their first contribution in https://github.com/sgl-project/sglang/pull/15682
@Estrella-xx made their first contribution in https://github.com/sgl-project/sglang/pull/17811
@Evrard-Nil made their first contribution in https://github.com/sgl-project/sglang/pull/17813
@FrankMinions made their first contribution in https://github.com/sgl-project/sglang/pull/18804
@HaiShaw made their first contribution in https://github.com/sgl-project/sglang/pull/18101
@HanHan009527 made their first contribution in https://github.com/sgl-project/sglang/pull/18091
@HandH1998 made their first contribution in https://github.com/sgl-project/sglang/pull/16892
@Hide-on-bushsh made their first contribution in https://github.com/sgl-project/sglang/pull/17922
@JD-ETH made their first contribution in https://github.com/sgl-project/sglang/pull/18694
@JiaruiChang5268 made their first contribution in https://github.com/sgl-project/sglang/pull/17007
@Lollipop made their first contribution in https://github.com/sgl-project/sglang/pull/18024
@LuYanFCP made their first contribution in https://github.com/sgl-project/sglang/pull/18970
@Lzy17 made their first contribution in https://github.com/sgl-project/sglang/pull/17183
@Mahdi-CV made their first contribution in https://github.com/sgl-project/sglang/pull/17040
@Makcum888e made their first contribution in https://github.com/sgl-project/sglang/pull/17584
@McZyWu made their first contribution in https://github.com/sgl-project/sglang/pull/16866
@MikkoParkkola made their first contribution in https://github.com/sgl-project/sglang/pull/17816
@Nickcp39 made their first contribution in https://github.com/sgl-project/sglang/pull/18739
@PeaBrane made their first contribution in https://github.com/sgl-project/sglang/pull/18944
@RangerCD made their first contribution in https://github.com/sgl-project/sglang/pull/17929
@RubiaCx made their first contribution in https://github.com/sgl-project/sglang/pull/18026
@RunningLeon made their first contribution in https://github.com/sgl-project/sglang/pull/17939
@Simon-Li made their first contribution in https://github.com/sgl-project/sglang/pull/17430
@SoluMilken made their first contribution in https://github.com/sgl-project/sglang/pull/18860
@Sugar920 made their first contribution in https://github.com/sgl-project/sglang/pull/17952
@TTThanos made their first contribution in https://github.com/sgl-project/sglang/pull/17866
@TamirBaydasov made their first contribution in https://github.com/sgl-project/sglang/pull/17503
@WiwilZ made their first contribution in https://github.com/sgl-project/sglang/pull/18787
@YazhiGao made their first contribution in https://github.com/sgl-project/sglang/pull/18743
@ZhenshengWu made their first contribution in https://github.com/sgl-project/sglang/pull/17540
@ZiguanWang made their first contribution in https://github.com/sgl-project/sglang/pull/16225
@aaaandychen made their first contribution in https://github.com/sgl-project/sglang/pull/16781
@airMeng made their first contribution in https://github.com/sgl-project/sglang/pull/13561
@ajpqs made their first contribution in https://github.com/sgl-project/sglang/pull/14765
@akao-amd made their first contribution in https://github.com/sgl-project/sglang/pull/18698
@akhilg-nv made their first contribution in https://github.com/sgl-project/sglang/pull/16758
@amote-i made their first contribution in https://github.com/sgl-project/sglang/pull/17573
@andyluo7 made their first contribution in https://github.com/sgl-project/sglang/pull/18753
@ant-yy made their first contribution in https://github.com/sgl-project/sglang/pull/18598
@aurickq made their first contribution in https://github.com/sgl-project/sglang/pull/18152
@billishyahao made their first contribution in https://github.com/sgl-project/sglang/pull/18320
@bingps made their first contribution in https://github.com/sgl-project/sglang/pull/16907
@bixue2010 made their first contribution in https://github.com/sgl-project/sglang/pull/17781
@blake-snc made their first contribution in https://github.com/sgl-project/sglang/pull/18751
@blazingbhavneek made their first contribution in https://github.com/sgl-project/sglang/pull/18687
@bledden made their first contribution in https://github.com/sgl-project/sglang/pull/17972
@cctry made their first contribution in https://github.com/sgl-project/sglang/pull/17850
@celve made their first contribution in https://github.com/sgl-project/sglang/pull/17889
@chanh made their first contribution in https://github.com/sgl-project/sglang/pull/18009
@chengshuang18 made their first contribution in https://github.com/sgl-project/sglang/pull/18318
@chenxu214 made their first contribution in https://github.com/sgl-project/sglang/pull/17511
@cicirori made their first contribution in https://github.com/sgl-project/sglang/pull/18064
@clintg6 made their first contribution in https://github.com/sgl-project/sglang/pull/19091
@cswuyg made their first contribution in https://github.com/sgl-project/sglang/pull/17974
@debo3 made their first contribution in https://github.com/sgl-project/sglang/pull/18396
@dongjiyingdjy made their first contribution in https://github.com/sgl-project/sglang/pull/17213
@dreamyang-liu made their first contribution in https://github.com/sgl-project/sglang/pull/18306
@dutsc made their first contribution in https://github.com/sgl-project/sglang/pull/17301
@edwardzjl made their first contribution in https://github.com/sgl-project/sglang/pull/18171
@fsygd made their first contribution in https://github.com/sgl-project/sglang/pull/17751
@fy1214 made their first contribution in https://github.com/sgl-project/sglang/pull/17023
@gaopengff made their first contribution in https://github.com/sgl-project/sglang/pull/14592
@gingerXue made their first contribution in https://github.com/sgl-project/sglang/pull/17499
@glenliu21 made their first contribution in https://github.com/sgl-project/sglang/pull/17464
@gongyisheng made their first contribution in https://github.com/sgl-project/sglang/pull/17690
@hammersam made their first contribution in https://github.com/sgl-project/sglang/pull/17747
@haojin2 made their first contribution in https://github.com/sgl-project/sglang/pull/18045
@haowen-han made their first contribution in https://github.com/sgl-project/sglang/pull/18604
@happierpig made their first contribution in https://github.com/sgl-project/sglang/pull/18821
@hsuchifeng made their first contribution in https://github.com/sgl-project/sglang/pull/17744
@hxieustc made their first contribution in https://github.com/sgl-project/sglang/pull/18850
@jhinpan made their first contribution in https://github.com/sgl-project/sglang/pull/17863
@jianyingzhu made their first contribution in https://github.com/sgl-project/sglang/pull/14717
@jiashaokun-1 made their first contribution in https://github.com/sgl-project/sglang/pull/17025
@joearedmond made their first contribution in https://github.com/sgl-project/sglang/pull/17786
@kaixih made their first contribution in https://github.com/sgl-project/sglang/pull/18025
@kartikx made their first contribution in https://github.com/sgl-project/sglang/pull/17273
@klhhhhh made their first contribution in https://github.com/sgl-project/sglang/pull/18131
@klshuster made their first contribution in https://github.com/sgl-project/sglang/pull/18273
@koush made their first contribution in https://github.com/sgl-project/sglang/pull/18011
@kuafou made their first contribution in https://github.com/sgl-project/sglang/pull/18207
@laixinn made their first contribution in https://github.com/sgl-project/sglang/pull/17065
@lawtherWu made their first contribution in https://github.com/sgl-project/sglang/pull/15381
@lingebeng made their first contribution in https://github.com/sgl-project/sglang/pull/17699
@luke396 made their first contribution in https://github.com/sgl-project/sglang/pull/17118
@maning00 made their first contribution in https://github.com/sgl-project/sglang/pull/14626
@mansoor-s made their first contribution in https://github.com/sgl-project/sglang/pull/17434
@maocheng23 made their first contribution in https://github.com/sgl-project/sglang/pull/18512
@mattteochen made their first contribution in https://github.com/sgl-project/sglang/pull/18000
@mengchengTang made their first contribution in https://github.com/sgl-project/sglang/pull/17545
@michaelzhang-ai made their first contribution in https://github.com/sgl-project/sglang/pull/17523
@mmangkad made their first contribution in https://github.com/sgl-project/sglang/pull/17662
@muse-coder made their first contribution in https://github.com/sgl-project/sglang/pull/18496
@nanjiangwill made their first contribution in https://github.com/sgl-project/sglang/pull/17286
@nono-Sang made their first contribution in https://github.com/sgl-project/sglang/pull/17140
@nvcastet made their first contribution in https://github.com/sgl-project/sglang/pull/17089
@ovidiusm made their first contribution in https://github.com/sgl-project/sglang/pull/17654
@pansicheng made their first contribution in https://github.com/sgl-project/sglang/pull/18155
@ping1jing2 made their first contribution in https://github.com/sgl-project/sglang/pull/18495
@pokymono made their first contribution in https://github.com/sgl-project/sglang/pull/17888
@polisettyvarma made their first contribution in https://github.com/sgl-project/sglang/pull/10021
@qianyue76 made their first contribution in https://github.com/sgl-project/sglang/pull/18095
@qmzznbxhl made their first contribution in https://github.com/sgl-project/sglang/pull/18705
@raayandhar made their first contribution in https://github.com/sgl-project/sglang/pull/13313
@realray808 made their first contribution in https://github.com/sgl-project/sglang/pull/18888
@rootonchair made their first contribution in https://github.com/sgl-project/sglang/pull/18560
@shaharmor98 made their first contribution in https://github.com/sgl-project/sglang/pull/17700
@shvmjndl made their first contribution in https://github.com/sgl-project/sglang/pull/17806
@sleepcoo made their first contribution in https://github.com/sgl-project/sglang/pull/14607
@sogalin made their first contribution in https://github.com/sgl-project/sglang/pull/17656
@strgrb made their first contribution in https://github.com/sgl-project/sglang/pull/17508
@talorabr made their first contribution in https://github.com/sgl-project/sglang/pull/19011
@tc-mb made their first contribution in https://github.com/sgl-project/sglang/pull/9610
@tianchongchong made their first contribution in https://github.com/sgl-project/sglang/pull/17858
@tie-pilot-qxw made their first contribution in https://github.com/sgl-project/sglang/pull/17507
@tom-zju made their first contribution in https://github.com/sgl-project/sglang/pull/18459
@triple-mu made their first contribution in https://github.com/sgl-project/sglang/pull/17834
@tugot17 made their first contribution in https://github.com/sgl-project/sglang/pull/17777
@vedantjh2 made their first contribution in https://github.com/sgl-project/sglang/pull/17780
@wangfakang made their first contribution in https://github.com/sgl-project/sglang/pull/17756
@wenchen76 made their first contribution in https://github.com/sgl-project/sglang/pull/7839
@xiaobaicxy made their first contribution in https://github.com/sgl-project/sglang/pull/18657
@xiaoweiw-nv made their first contribution in https://github.com/sgl-project/sglang/pull/17198
@xiaoyewww made their first contribution in https://github.com/sgl-project/sglang/pull/18317
@xu-yfei made their first contribution in https://github.com/sgl-project/sglang/pull/17076
@xvyaward made their first contribution in https://github.com/sgl-project/sglang/pull/16294
@xyjixyjixyji made their first contribution in https://github.com/sgl-project/sglang/pull/17347
@ycchen-tw made their first contribution in https://github.com/sgl-project/sglang/pull/16521
@yefei12 made their first contribution in https://github.com/sgl-project/sglang/pull/15119
@yingluosanqian made their first contribution in https://github.com/sgl-project/sglang/pull/18038
@yunkchen made their first contribution in https://github.com/sgl-project/sglang/pull/17129
@zack041 made their first contribution in https://github.com/sgl-project/sglang/pull/18240
@zackyoray made their first contribution in https://github.com/sgl-project/sglang/pull/17146
@zhangxiaolei123456 made their first contribution in https://github.com/sgl-project/sglang/pull/18833
@zhangxin81 made their first contribution in https://github.com/sgl-project/sglang/pull/16725
@zhaochenyang20 made their first contribution in https://github.com/sgl-project/sglang/pull/18390
@zhaohaidao made their first contribution in https://github.com/sgl-project/sglang/pull/18096
@zhendonghua made their first contribution in https://github.com/sgl-project/sglang/pull/17891
@zianglih made their first contribution in https://github.com/sgl-project/sglang/pull/17688
@zijiexia made their first contribution in https://github.com/sgl-project/sglang/pull/17663
@zju-stu-lizheng made their first contribution in https://github.com/sgl-project/sglang/pull/17624
@zwang86 made their first contribution in https://github.com/sgl-project/sglang/pull/18225
@zzhbrr made their first contribution in https://github.com/sgl-project/sglang/pull/10707
@zzzzzzzxh made their first contribution in https://github.com/sgl-project/sglang/pull/15363

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.5.8...v0.5.9

Source: README.md, updated 2026-02-23

SGLang Files

SGLang is a fast serving framework for large language models

Highlights

New Model Support

SGLang-Diffusion

Performance

Prefill-Decode Disaggregation

Diffusion LLM (dLLM)

Speculative Decoding

Dependencies

AMD Hardware

NPU/Ascend

CPU Backend

Kernel Slimming

Documentation

What's Changed

New Contributors

SGLang Files

SGLang is a fast serving framework for large language models

Get an email when there's a new version of SGLang

Highlights

New Model Support

SGLang-Diffusion

Performance

Prefill-Decode Disaggregation

Diffusion LLM (dLLM)

Speculative Decoding

Dependencies

AMD Hardware

NPU/Ascend

CPU Backend

Kernel Slimming

Documentation

What's Changed

New Contributors