SGLang - Browse /v0.5.2 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-09-11	49.4 kB	0
Release v0.5.2 source code.tar.gz	2025-09-11	5.4 MB	0
Release v0.5.2 source code.zip	2025-09-11	6.7 MB	0
Totals: 3 Items		12.2 MB	0

What's Changed

feat: allow use local branch to build image by @gongwei-130 in https://github.com/sgl-project/sglang/pull/9546
[readme] Include additional resources for the SGLang x AMD SF Meetup event by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/9547
[doc] deepseekv31 support by @XiaotongJiang in https://github.com/sgl-project/sglang/pull/9544
fix(grok): remove duplicate replicate_lm_head configuration by @vincentzed in https://github.com/sgl-project/sglang/pull/9549
chore: update configurer by @zhyncs in https://github.com/sgl-project/sglang/pull/9557
chore: bump v0.5.1.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9558
[router] add right rustls dependency in sgl-router cargo.toml by @Bruce-x-1997 in https://github.com/sgl-project/sglang/pull/9498
fix: use sgl-kernel 0.3.5 by @zhyncs in https://github.com/sgl-project/sglang/pull/9565
Add target module validation for init adapters by @Beichen-Ma in https://github.com/sgl-project/sglang/pull/9429
fix: Update OpenAI client base URL in documentation by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9576
[PD] Improve disaggregation metrics output: update the metrics to keep reflecting real stats by @SCDESPERTATE in https://github.com/sgl-project/sglang/pull/7317
remove redundant rank0_log function. by @miter6 in https://github.com/sgl-project/sglang/pull/9560
Update CUTLASS 4.2 & Enable K-Major Scale Factor for SM90 FP8 Blockwise Group GEMM by @HydraQYH in https://github.com/sgl-project/sglang/pull/9559
Reintroduce memory usage fix by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9535
Offload tensors by sharding on GPU by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9536
bugfix for undefined logging functions in HarmonyBrowserTool & HarmonyPythonTool by @CiaranZhou in https://github.com/sgl-project/sglang/pull/9229
chore: upgrade flashinfer 0.2.14.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9578
fix: revert [#8593] by @zhyncs in https://github.com/sgl-project/sglang/pull/9581
fix: resolve tuning fused moe issue by @zhyncs in https://github.com/sgl-project/sglang/pull/9587
Tiny fix wrong comments by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9589
chore: update config by @zhyncs in https://github.com/sgl-project/sglang/pull/9591
chore: bump v0.5.1.post2 by @zhyncs in https://github.com/sgl-project/sglang/pull/9592
[Doc] add LWS(LeaderWorkerSet) use case in sgl-router README by @Bruce-x-1997 in https://github.com/sgl-project/sglang/pull/9568
[Performance] Batch Send from Tokenizer Manager. by @sundar24295s in https://github.com/sgl-project/sglang/pull/9436
Fix GLM45 tool call multi-turn bug by @byjiang1996 in https://github.com/sgl-project/sglang/pull/9500
Fix GLM45v launch server cuda torch compile bug by @byjiang1996 in https://github.com/sgl-project/sglang/pull/9554
Fix Harmony reasoning parser for and auto-separation for gpt-oss models by @jonaslsaa in https://github.com/sgl-project/sglang/pull/9190
[docs] Refactor, remove compiled results and add gpt-oss by @zhaochenyang20 in https://github.com/sgl-project/sglang/pull/9613
[Fix] HiCache Bugfix & Mooncake Error Handling Enhance by @ykwd in https://github.com/sgl-project/sglang/pull/8901
Improve bench_one_batch_server script by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9608
[router] add mistral tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9622
[router] add qwen tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9623
[router] add pythonic parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9628
[router] add llama tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9629
[router] add ut for mistral, llama, pythonic, and streaming tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9632
[new feat] ascend backend support fia fusion kernel by @ZhengdQin in https://github.com/sgl-project/sglang/pull/8328
model: Support nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 by @netanel-haber in https://github.com/sgl-project/sglang/pull/9301
Fix lint for router by @hebiao064 in https://github.com/sgl-project/sglang/pull/9636
[docs] Update README with additional highlights and resources for SGLang x AMD SF Meetup by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/9640
Add reasoning_effort param in TiktokenTokenizer.apply_chat_template by @lshmouse in https://github.com/sgl-project/sglang/pull/9630
fix: allow user to specify function as role by @GavinZhu-GMI in https://github.com/sgl-project/sglang/pull/9635
Fix kimi k2 function calling format by @XiaotongJiang in https://github.com/sgl-project/sglang/pull/9606
[router] address worker load tracking consistency by @slin1237 in https://github.com/sgl-project/sglang/pull/9523
[router] add token bucket rate limiter by @CatherineSue in https://github.com/sgl-project/sglang/pull/9656
[doc] add kimik2 --tool-call-parser by @XiaotongJiang in https://github.com/sgl-project/sglang/pull/9647
Install py-spy by default for containers for easier debugging by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9649
BugFix(hicache): Fix host indices out of bound error by @hzh0425 in https://github.com/sgl-project/sglang/pull/9637
HiCache Storage fix host memory leak by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9648
add response_format support for completion API by @cicirori in https://github.com/sgl-project/sglang/pull/9665
Fix FA3 swa spec verify topk>1 by @ispobock in https://github.com/sgl-project/sglang/pull/9658
[RL] fix register the same ops multiple times by @hebiao064 in https://github.com/sgl-project/sglang/pull/9564
chore: enhance bench_serving for vlms with a new dataset of configurable image count and resolution by @mickqian in https://github.com/sgl-project/sglang/pull/9583
refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management by @hzh0425 in https://github.com/sgl-project/sglang/pull/9555
feat: (chat-template matching) enhance multimodal model detection with config.json by @KEVINTUAN12 in https://github.com/sgl-project/sglang/pull/9597
[docs] Instructions for bench_serving.py by @yhyang201 in https://github.com/sgl-project/sglang/pull/9071
Support DeepSeek-V3.1 tool call by @Xu-Wenqing in https://github.com/sgl-project/sglang/pull/9446
Add A100 fused MoE kernel configs for Dpsk by @ehuaa in https://github.com/sgl-project/sglang/pull/9677
support cuda 13.0 and trtllm kernel by @rainj-me in https://github.com/sgl-project/sglang/pull/9495
fix: HiRadixCache: fix prefetch completion race by @pabloiyu in https://github.com/sgl-project/sglang/pull/9397
fix mooncake store mla zero copy meta by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/9678
move is_sm90_supported/is_sm100_supported to python/sglang/srt/utils.py by @merrymercy in https://github.com/sgl-project/sglang/pull/9679
[router] restructure tool parser module folder by @slin1237 in https://github.com/sgl-project/sglang/pull/9693
[router] add deepseek tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9694
Quick fix for loading processor for supporting internvl3_5 series by @yilian49 in https://github.com/sgl-project/sglang/pull/9676
Fix get_ip when no external network by @whybeyoung in https://github.com/sgl-project/sglang/pull/9700
Sets default model name in request classes by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9683
[router] add step3 tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9695
[router] add kimi-k2 tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9702
[router] add gpt-oss and glm4 tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9703
[sgl-kernel] misc: update deepgemm version for sgl-kernel by @FlamingoPg in https://github.com/sgl-project/sglang/pull/9340
chore: upgrade sgl-kernel 0.3.7 by @zhyncs in https://github.com/sgl-project/sglang/pull/9708
chore: bump v0.5.1.post3 by @zhyncs in https://github.com/sgl-project/sglang/pull/9716
[router] upgrade kernel version in pd ci by @CatherineSue in https://github.com/sgl-project/sglang/pull/9720
[Sync] Update mxfp4.py (20250827) by @merrymercy in https://github.com/sgl-project/sglang/pull/9724
[router] fix error response in pd_router by @Bruce-x-1997 in https://github.com/sgl-project/sglang/pull/9505
[router] Add MCP Tool Handler by @key4ng in https://github.com/sgl-project/sglang/pull/9615
gpt-oss blog reproduction document by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9728
[router] additional pythonic parser unit test by @slin1237 in https://github.com/sgl-project/sglang/pull/9730
[router] additional llama32 parser unit test and multi json support by @slin1237 in https://github.com/sgl-project/sglang/pull/9732
support mooncake store dp attention by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/9684
add support for nvidia/gpt-oss-120b-Eagle3 by @zyksir in https://github.com/sgl-project/sglang/pull/9739
Move git clone command up from README by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9740
[feat] Reduce GPU memory overhead by using weakref by @yhyang201 in https://github.com/sgl-project/sglang/pull/9673
Support speculative decoding in hybrid attention backend by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/9573
[router] add llama3.2 multi json streaming parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9735
Support compile sgl-kernel on cuda 13.0 by @rainj-me in https://github.com/sgl-project/sglang/pull/9721
[Sync] Update server_args.py (20250828) by @merrymercy in https://github.com/sgl-project/sglang/pull/9745
[router] grpc router bootstraps by @slin1237 in https://github.com/sgl-project/sglang/pull/9759
[AMD] Support Hierarchical Caching on AMD GPUs by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/8236
feat: add tuned fused moe config for GLM-4.5-Air-FP8 tp = 4 on B200 by @zixuanzhang226 in https://github.com/sgl-project/sglang/pull/9770
[Feature] Support NPUGraph for DeepSeek on Ascend NPU by @chenxu140 in https://github.com/sgl-project/sglang/pull/9355
feat(draft_model): support draft_model for RemoteModelLoader by @DellCurry in https://github.com/sgl-project/sglang/pull/6407
fix: fix MLA for ShardedModelLoader/RemoteModelLoader by @DellCurry in https://github.com/sgl-project/sglang/pull/6287
Optimize prefill performance on cpu backend by @mingfeima in https://github.com/sgl-project/sglang/pull/8750
[HiCache] change the default policy to write through by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9772
bugfix(hicache): Move exists check before key suffixing by @hzh0425 in https://github.com/sgl-project/sglang/pull/9749
Skip some tests on Blackwell by @hlu1 in https://github.com/sgl-project/sglang/pull/9777
Raise error when topk>1 and page>1 for paged attention backends. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9784
ROCm 7.0 update by @sogalin in https://github.com/sgl-project/sglang/pull/9757
add bench_mix.py by @pansicheng in https://github.com/sgl-project/sglang/pull/9788
Make sm100 fp8 kernels available on sm103 by @hlu1 in https://github.com/sgl-project/sglang/pull/9789
accomendate json schema in the "schema" field, not in "json_schema" field of response_format by @gongwei-130 in https://github.com/sgl-project/sglang/pull/9786
[PD] Support get_model_info interface for mini_lb by @XucSh in https://github.com/sgl-project/sglang/pull/9792
[HiCache] resolve conflict between chunked-prefill and hicache hit count by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9776
feat(hicache-3fs): 3FS-Store Backup Optimizations For MLA Model. by @hzh0425 in https://github.com/sgl-project/sglang/pull/9692
support enable in the reasoning field to enable thingking for thinkin… by @gongwei-130 in https://github.com/sgl-project/sglang/pull/9715
feat: Add flexible validation for partial weight updates by @GeLee-Q in https://github.com/sgl-project/sglang/pull/9663
feat: add original logprobs to response by @narutolhy in https://github.com/sgl-project/sglang/pull/8375
[feat] Support EAGLE3 for Qwen2 by @KerwinKai in https://github.com/sgl-project/sglang/pull/9216
chore: upgrade flashinfer 0.3.0rc1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9793
[ModelOpt] Fix Weight Loading for DSR1-FP4 Quantization by @pavanimajety in https://github.com/sgl-project/sglang/pull/9712
Fix TRTLLM MLA Cuda KV Blocks Causing accuracy drop by @farazkh80 in https://github.com/sgl-project/sglang/pull/9675
[NVIDIA] [2/N] Optimize silu_and_mul_scaled_fp4_grouped_quant perf by @kaixih in https://github.com/sgl-project/sglang/pull/9556
Adds initialize_moe_config to bench_one_batch so MOE backend is respected by @pranavm-nvidia in https://github.com/sgl-project/sglang/pull/9670
Small bug fix in transformers model implementation by @yilian49 in https://github.com/sgl-project/sglang/pull/9809
feature(eplb): add min-rebalancing-utilization-threshold for eplb by @hzh0425 in https://github.com/sgl-project/sglang/pull/8345
Make fp4_quantize kernels work on sm103 by @hlu1 in https://github.com/sgl-project/sglang/pull/9807
fix: dsv3 lite q_lora_rank none by @zhyncs in https://github.com/sgl-project/sglang/pull/9815
Fix memory leak when aborting decode request in PD-Disagg by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9817
chore: fix cuda driver api issue and bump sgl-kernel 0.3.7.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9746
chore: update Dockerfile by @zhyncs in https://github.com/sgl-project/sglang/pull/9820
Fix typo in warning message about DeepGEMM JIT by @mmangkad in https://github.com/sgl-project/sglang/pull/9802
chore: upgrade sgl-kernel 0.3.7.post1 with deepgemm fix by @zhyncs in https://github.com/sgl-project/sglang/pull/9822
[sgl-kernel] fix: fix missing FetchContent_Populate for fmt by @FlamingoPg in https://github.com/sgl-project/sglang/pull/9826
chore: upgrade transformers 4.56.0 by @zhyncs in https://github.com/sgl-project/sglang/pull/9827
[Auto Sync] Update parallel_state.py (20250830) by @merrymercy in https://github.com/sgl-project/sglang/pull/9828
[CI] Fix the trigger condition for PR test workflows by @merrymercy in https://github.com/sgl-project/sglang/pull/9761
[CI] Code sync tools by @merrymercy in https://github.com/sgl-project/sglang/pull/9830
Update guidelines for syncing code between repos by @merrymercy in https://github.com/sgl-project/sglang/pull/9831
hot fix for mooncake batch set api by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9836
[router] add reasoning parser readme by @slin1237 in https://github.com/sgl-project/sglang/pull/9837
Tool parser.benchmark by @CatherineSue in https://github.com/sgl-project/sglang/pull/9835
[Model] Support Meituan LongCat-Flash && LongCat-Flash-MTP by @Orchard-DT in https://github.com/sgl-project/sglang/pull/9824
[router] global tool parser registry by @CatherineSue in https://github.com/sgl-project/sglang/pull/9840
[feat]Ascend NPU Gemma-3-12b and Gemma-3-27b support by @VDV1985 in https://github.com/sgl-project/sglang/pull/8909
[Performance] Improve Qwen RMSNorm by replacing with native RMSNorm op by @vincentzed in https://github.com/sgl-project/sglang/pull/9709
[HiCache] Clear kvcache in storage backend with fastAPI by @stmatengss in https://github.com/sgl-project/sglang/pull/9750
Fix input logprob index for a batch that includes both requests with input logprob and requests with input logprob. by @merrymercy in https://github.com/sgl-project/sglang/pull/9841
Fuse gate_proj and up_proj in Qwen 2.5 VL's vision MLP by @AlienKevin in https://github.com/sgl-project/sglang/pull/9661
[HiCache] Storage Refactoring by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9797
fix set_interal_state API by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9850
fix inconsistent arguments for generated shared prefix bench by @pbkowalski in https://github.com/sgl-project/sglang/pull/9073
fix(hicahce-long-bench): adjust context workload generator to use full query set by @hzh0425 in https://github.com/sgl-project/sglang/pull/9847
Disable radix cache in test_lora_update.py for better stability by @Fridge003 in https://github.com/sgl-project/sglang/pull/9852
Tiny allow DeepGEMM on cu12.9 by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9858
Update docker build workflows for gfx942 ROCm 7.0. by @saienduri in https://github.com/sgl-project/sglang/pull/9794
Support Multi Process Tokenizer Manager(#6555) by @whybeyoung in https://github.com/sgl-project/sglang/pull/8964
chore: upgrade flashinfer 0.3.0 by @zhyncs in https://github.com/sgl-project/sglang/pull/9864
chore: bump v0.5.2rc0 by @zhyncs in https://github.com/sgl-project/sglang/pull/9862
Mooncake store get zero copy meta optimization by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/9857
[router] add tokenizer download support from hf hub by @CatherineSue in https://github.com/sgl-project/sglang/pull/9882
support fp8 kvcache for hybrid attn backend on GPT-OSS by @rainj-me in https://github.com/sgl-project/sglang/pull/9783
[HiCacheStorage] fix abort request host memory leaks by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/9874
[HiCacheStorage]: Improve 3fs kvstore‘s performance and resolve mla issues by @hzh0425 in https://github.com/sgl-project/sglang/pull/9876
[router] Fix short timeout for the prefill client by @LukasBluebaum in https://github.com/sgl-project/sglang/pull/9803
[code style] restruct fused_moe to avoid very long single file by @BBuf in https://github.com/sgl-project/sglang/pull/9878
[router] add grpc pd and regular router init by @CatherineSue in https://github.com/sgl-project/sglang/pull/9893
[router] fix FunctionCallResponse proto, support arguments is null by @Bruce-x-1997 in https://github.com/sgl-project/sglang/pull/9875
[feat] Support tp mode for DeepSeek-R1-W4AFP8 by @chenxijun1029 in https://github.com/sgl-project/sglang/pull/8118
Move multi-tokenizer event loop to better place by @ShangmingCai in https://github.com/sgl-project/sglang/pull/9902
[chore] fix dead links in doc by @lifuhuang in https://github.com/sgl-project/sglang/pull/9913
Change tensor alignment method to mn major by @mmangkad in https://github.com/sgl-project/sglang/pull/9844
chore: bump v0.3.8 sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/9907
[Fix] fix the issue encountered when inference LongCat-Flash/MTP EP MoE on b200 by @Orchard-DT in https://github.com/sgl-project/sglang/pull/9916
fix parallel_state.py current_platform bug by @BBuf in https://github.com/sgl-project/sglang/pull/9919
[feat] apply deep_gemm compile_mode to skip launch by @Alcanderian in https://github.com/sgl-project/sglang/pull/9879
fix: update router deps by @zhyncs in https://github.com/sgl-project/sglang/pull/9921
chore: bump v0.5.2rc1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9920
[Hicache] Generic page get bugfix by @ykwd in https://github.com/sgl-project/sglang/pull/9909
Support the internvl3.5 family models in sglang by @yilian49 in https://github.com/sgl-project/sglang/pull/9705
[router] include rust benchamrks by @slin1237 in https://github.com/sgl-project/sglang/pull/9932
Fix the key passing issue in page first layout. by @hzh0425 in https://github.com/sgl-project/sglang/pull/9929
[router] fix grpc client url normalzation and health check by @CatherineSue in https://github.com/sgl-project/sglang/pull/9939
[model] support MiniCPM-V 4.0 by @tc-mb in https://github.com/sgl-project/sglang/pull/8747
[HiCache] Minor fix on file storage backend by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9869
Move parsers under a single folder by @merrymercy in https://github.com/sgl-project/sglang/pull/9912
[Fix] DeepSeek EP accuracy issue on B200 GPUs by @alhridoy in https://github.com/sgl-project/sglang/pull/9946
fix(cache): move ongoing_prefetch pop after validation to prevent leak by @xiaguan in https://github.com/sgl-project/sglang/pull/9927
Remove annoying warnings in sgl kernel build by @merrymercy in https://github.com/sgl-project/sglang/pull/9905
Update tool_chat_template_deepseekv31.jinja by @WangJianQ-0118 in https://github.com/sgl-project/sglang/pull/9895
Qwen FP8/NVFP4 ModelOPT Quantization support by @jingyu-ml in https://github.com/sgl-project/sglang/pull/7912
Optimized deepseek-v3/r1 model performance on mxfp4 run by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/9671
add proctitle for tokenizers by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9952
[feat] Add P/D attention select for draft model by @Ximingwang-09 in https://github.com/sgl-project/sglang/pull/9755
Revert "[Fix] DeepSeek EP accuracy issue on B200 GPUs (#9946)" by @zhyncs in https://github.com/sgl-project/sglang/pull/9955
Revert "Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671)" by @zhyncs in https://github.com/sgl-project/sglang/pull/9959
[benchmark] add flashinfer_allreduce_fusion benchmark by @BBuf in https://github.com/sgl-project/sglang/pull/9937
[1/N][Bug] Fix w4afp8 MoE NaN issue (sgl-kernel) by @yuhyao in https://github.com/sgl-project/sglang/pull/9953
[router] Add Rerank API Specification by @fangjian601 in https://github.com/sgl-project/sglang/pull/9906
[router] add chat_template_kwargs in ChatCompletionRequest by @tonyluj in https://github.com/sgl-project/sglang/pull/9958
Remove mrope position sync by @timmy-feng in https://github.com/sgl-project/sglang/pull/9460
fix swa clear(): rename is_in_free_group to is_not_in_free_group by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9914
Triton 3.4.0 MoE config for Deepseek TP16 H100 by @SzymonOzog in https://github.com/sgl-project/sglang/pull/9978
nsys profile output kernel classifier by @gracehonv in https://github.com/sgl-project/sglang/pull/9314
Minor update regarding issue [#9704] by @elfiegg in https://github.com/sgl-project/sglang/pull/9733
[Auto Sync] Update parallel_state.py, few_shot_gsm8k.py (20250903) by @merrymercy in https://github.com/sgl-project/sglang/pull/9986
feat: add gpt oss b200 ci by @zhyncs in https://github.com/sgl-project/sglang/pull/9988
[router] move tokenizer, reasoning, tool initialization to server by @slin1237 in https://github.com/sgl-project/sglang/pull/9996
[router] clean up dependency injector to use ctx by @slin1237 in https://github.com/sgl-project/sglang/pull/10000
[router] fix grpc connection mode detection by @slin1237 in https://github.com/sgl-project/sglang/pull/9999
[Fix] gpt-oss mxfp4 model run failed on ROCm platform by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/9994
Fix Llama 4 with MXFP4 dynamic quant on MI35x by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/9993
[Bugfix] fix pd chat completion protocol for batching support by @tonyluj in https://github.com/sgl-project/sglang/pull/10016
fix: health_generate endpoint in mini_lb by @wxsms in https://github.com/sgl-project/sglang/pull/9997
[1/N] DP-refactor: move dp balance code into scheduler's mixin class by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10004
Ensure chunked request extension length respects both rem_chunk_tokens and rem_total_tokens limits by @pansicheng in https://github.com/sgl-project/sglang/pull/10003
feat(hicache): Add generic hicache ci e2e test and benchmark test by @hzh0425 in https://github.com/sgl-project/sglang/pull/9846
Optimize Qwen3-moe model by using flashinfer fused allreduce by @yuan-luo in https://github.com/sgl-project/sglang/pull/9973
[Doc] Fix SGLang tool parser doc by @PopSoda2002 in https://github.com/sgl-project/sglang/pull/9886
metrics: support customer buckets for prompt/generation_tokens_histogram by @acelyc111 in https://github.com/sgl-project/sglang/pull/9634
fix 3fs zerocopy by @pansicheng in https://github.com/sgl-project/sglang/pull/9938
Save memory for expert model parallel by @ch-wan in https://github.com/sgl-project/sglang/pull/9957
[Hicache] Mooncake API Fix & Test, and Improved Readme by @ykwd in https://github.com/sgl-project/sglang/pull/9951
Optimized deepseek-v3/r1 model performance on mxfp4 run by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/10008
Fix accuracy drop of dsv3 run in dp enablement by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/8677
chore: bump v0.5.2rc2 by @zhyncs in https://github.com/sgl-project/sglang/pull/10050
fix: update gb200 dep by @zhyncs in https://github.com/sgl-project/sglang/pull/10052
Simplify Router arguments passing and build it in docker image by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9964
[router] fix release workflow to include protobuf by @CatherineSue in https://github.com/sgl-project/sglang/pull/10055
fix MultiTokenizerWrapper name by @LLLL114 in https://github.com/sgl-project/sglang/pull/10049
Integrate trtllm ragged attention for prefill self-attention by @elfiegg in https://github.com/sgl-project/sglang/pull/9801
[Vulnerability]feat(conn): set bootstrap server host by @jinmingyi1998 in https://github.com/sgl-project/sglang/pull/9931
Fix typo in scheduler by @JamesLim-sy in https://github.com/sgl-project/sglang/pull/9934
[1/2] Optimizations and refactors about quant kernel by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9534
Tiny support setting numa nodes for different ranks by @fzyzcjy in https://github.com/sgl-project/sglang/pull/10006
[Fix] Add speculative_draft_model_revision to server_args by @DevashishLal-CB in https://github.com/sgl-project/sglang/pull/5255
Forbid DeepEP racing condition when too many tokens by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9567
Support simple evals in text comparator by @fzyzcjy in https://github.com/sgl-project/sglang/pull/8867
Fix and enhance dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/8725
Tiny let DeepGEMM scale checks cover more cases by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7182
Support copying tensor from cpu to gpu without using copy engines by @fzyzcjy in https://github.com/sgl-project/sglang/pull/10007
[router] add py binding unit tests to coverage 80% by @key4ng in https://github.com/sgl-project/sglang/pull/10043
[router] add rust cache for rust unit test by @key4ng in https://github.com/sgl-project/sglang/pull/10079
[router] add rust cache by @slin1237 in https://github.com/sgl-project/sglang/pull/10080
enable aiter gemm_a8w8_bpreshuffle for ptpc gemm by @Yuechguo in https://github.com/sgl-project/sglang/pull/8555
[bugfix]: use correct cache location for cross attention in torch native backend by @MahmoudAshraf97 in https://github.com/sgl-project/sglang/pull/8622
Update flashinfer to 0.3.1 for B300 support by @hlu1 in https://github.com/sgl-project/sglang/pull/10087
[Bug Fix] Fix Glm4vVisionBlock norm by @sdpkjc in https://github.com/sgl-project/sglang/pull/9884
Update wave-lang to 3.7.0 and unify Wave kernel buffer options by @yichiche in https://github.com/sgl-project/sglang/pull/10069
Add storage read/write bandwidth logs to monitor kvcache performance by @pansicheng in https://github.com/sgl-project/sglang/pull/9965
[Minor] Refactors KV memory pool by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9842
support Llama4 with non uniformed intermediate size across layers for… by @gongwei-130 in https://github.com/sgl-project/sglang/pull/10047
[router] move to mcp sdk instead by @slin1237 in https://github.com/sgl-project/sglang/pull/10057
[router] Introduce router integration tests by @key4ng in https://github.com/sgl-project/sglang/pull/10086
Add lora_path argument to bench_multiturn.py by @Fridge003 in https://github.com/sgl-project/sglang/pull/10092
[HiStorage] Remove delete and clear as necessary methods by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/10039
Modify ci workflow for auto-partitioning in 2-GPU backend tests by @hzh0425 in https://github.com/sgl-project/sglang/pull/10029
Revert "[1/N][Bug] Fix w4afp8 MoE NaN issue (sgl-kernel) (#9953)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10097
Fix RMSNorm API CALL mismatch issue. by @sogalin in https://github.com/sgl-project/sglang/pull/10032
fix double sparsity initialization by @shadowpa0327 in https://github.com/sgl-project/sglang/pull/6905
[Fix] illegal sync based on undefined behaviour by @DevashishLal-CB in https://github.com/sgl-project/sglang/pull/9620
[7/N] MoE Refactor: the implementation of new framework by @ch-wan in https://github.com/sgl-project/sglang/pull/9269
[NVIDIA] Remove unused get_fused_moe_impl_class function by @kaixih in https://github.com/sgl-project/sglang/pull/9764
[NVIDIA] disable chunked prefix cache when dp is used by @kaixih in https://github.com/sgl-project/sglang/pull/9861
perf: Avoid unnecessary data type conversions for DeepSeek-V3 on Blackwell by @jinyangyuan-nvidia in https://github.com/sgl-project/sglang/pull/9834
[Fix] Compatibility between DP attention and pipeline parallelism by @ch-wan in https://github.com/sgl-project/sglang/pull/10100
Fix circular import by @ch-wan in https://github.com/sgl-project/sglang/pull/10107
Disable kernel cutlass_mla_decode on SM103 by @hlu1 in https://github.com/sgl-project/sglang/pull/10058
Remove non-accelerated targets(100 and up) from cmake by @hlu1 in https://github.com/sgl-project/sglang/pull/10041
[chore] Remove unused ep_moe cuda kernels by @hlu1 in https://github.com/sgl-project/sglang/pull/9956
[CI] Refactor disaggregation tests by @ShangmingCai in https://github.com/sgl-project/sglang/pull/10068
increase the rust e2e timeout by @key4ng in https://github.com/sgl-project/sglang/pull/10116
[router] Improve the e2e tests by @key4ng in https://github.com/sgl-project/sglang/pull/10102
[Auto Sync] Update server_args.py (20250906) by @merrymercy in https://github.com/sgl-project/sglang/pull/10117
Optimize moe_sum_reduce_kernel by @yuan-luo in https://github.com/sgl-project/sglang/pull/9477
[Feature] LMCache Connector Integration by @Oasis-Git in https://github.com/sgl-project/sglang/pull/9741
CUTLASS fp8 blockwise gemm support of sm120 by @jianyingzhu in https://github.com/sgl-project/sglang/pull/9969
Optimize nvfp4 block scaled gemm kernel when M is small. by @HydraQYH in https://github.com/sgl-project/sglang/pull/10101
Fix cuda graph mode in flashinfer attn backend by @benbarsdell in https://github.com/sgl-project/sglang/pull/10056
[HiCache] fix: check clear() method for storage backend by @stmatengss in https://github.com/sgl-project/sglang/pull/10096
add dataset_path for bench_one_batch_server.py by @miter6 in https://github.com/sgl-project/sglang/pull/10113
[Auto Sync] Update parallel_state.py (20250907) by @merrymercy in https://github.com/sgl-project/sglang/pull/10126
[Minor] fix lint in main by @DarkSharpness in https://github.com/sgl-project/sglang/pull/10128
[1/2] Refactor multi-tokenizer manager by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10074
Fix flashinfer version in sgl-kernel by @merrymercy in https://github.com/sgl-project/sglang/pull/10135
[DOC]: some minor updates by @yyihuang in https://github.com/sgl-project/sglang/pull/10134
[BUG FIX] add fail check when get fail in case wait complete block by @mss1213 in https://github.com/sgl-project/sglang/pull/9971
[MoE] fix: incorrect weight initialization for cutlass_fused_experts_fp8 by @ch-wan in https://github.com/sgl-project/sglang/pull/10144
Enables GLM4.1V server testing & fix video processing by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/10095
Fix slow fused add RMSNorm by @fzyzcjy in https://github.com/sgl-project/sglang/pull/10141
fix the fp8 topk_config.correction_bias is none bug by @rainj-me in https://github.com/sgl-project/sglang/pull/10040
Qwen2.5-VL eagle3 infer by @Lzhang-hub in https://github.com/sgl-project/sglang/pull/8801
Fix run time error in dsv3-fp8 model on mi35x by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/10104
Standalone speculative decoding by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/10090
Add graph runner support with torch compile on CPU by @CaoE in https://github.com/sgl-project/sglang/pull/7843
move compile threads to an option to avoid OOM on low memory host by @rainj-me in https://github.com/sgl-project/sglang/pull/10123
[1/N][Bug] Fix w4afp8 MoE NaN issue (sgl-kernel, fixed) by @yuhyao in https://github.com/sgl-project/sglang/pull/10108
[Bugfix] Retract not releasing enough memory when page size > 1 by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9989
Add speculator attention backend switch by @cicirori in https://github.com/sgl-project/sglang/pull/9981
Fix: (glm4v) Add missing field by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/10147
[Bugfix] Qwen3MoE aclrtMemcpy failed with NPUGraph by @iforgetmyname in https://github.com/sgl-project/sglang/pull/10013
enable auto-round quantization model by @WeiweiZhang1 in https://github.com/sgl-project/sglang/pull/6226
Revert "enable auto-round quantization model (#6226)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10148
enable llama3.1-8B on xpu by @huaiyuzh in https://github.com/sgl-project/sglang/pull/9434
[Bug fix] Fix Gemma 2 and fix Gemma 3 multimodal with bs > 1 on NPU by @ssshinigami in https://github.com/sgl-project/sglang/pull/9871
update xgrammar 0.1.24 and transformers 4.56.1 by @Swipe4057 in https://github.com/sgl-project/sglang/pull/10155
[2/N] DP-Refactor: move communicators into tokenizer_communicator_mixin by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10028
[Hicache]: Add E2E CI For 3FS-KVStore by @hzh0425 in https://github.com/sgl-project/sglang/pull/10131
Monkey patch uvicorn multi worker is_alive timeout by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10159
[CI] fix ambiguous argument in testing hybrid attentions. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10161
[1/2] Speed up prefill mla attention by @fzyzcjy in https://github.com/sgl-project/sglang/pull/10156
[Bug fix] Fix ascend mla in aclgraph by @alanhe151220037 in https://github.com/sgl-project/sglang/pull/9925
pref: Add H20 fp8 fused MoE kernel configs for Qwen3 by @Zhiy-Zhang in https://github.com/sgl-project/sglang/pull/10166
[fix] Relax white space rules in EBNFComposer by @LukasBluebaum in https://github.com/sgl-project/sglang/pull/9595
Revert "[ModelOpt] Fix Weight Loading for DSR1-FP4 Quantization (#9712)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10176
[Bench] feat: mooncake trace integration by @stmatengss in https://github.com/sgl-project/sglang/pull/9839
fix: resolve lint issue by @zhyncs in https://github.com/sgl-project/sglang/pull/10181
fix the cutlass moe tests by @rainj-me in https://github.com/sgl-project/sglang/pull/10182
gb200: update dockerfile to latest kernel by @ishandhanani in https://github.com/sgl-project/sglang/pull/9522
Cleaning codes for speculative attention mode by @Fridge003 in https://github.com/sgl-project/sglang/pull/10149
Revert "feat: add fused moe config for Qwen3-30B-A3B on B200" by @rainj-me in https://github.com/sgl-project/sglang/pull/10185
[Fix] Orphan process in data parallel by @Capronir in https://github.com/sgl-project/sglang/pull/7995
Update link for EAGLE speculative decoding by @gerayking in https://github.com/sgl-project/sglang/pull/10191
[CPU] Fix phi4-mm prompt issue in bench_serving by @blzheng in https://github.com/sgl-project/sglang/pull/9900
Updated Nvidia Jetson docs by @shahizat in https://github.com/sgl-project/sglang/pull/4422
[3/N]DP refactor: Improve dp rank scheduling in PD disaggregation mode. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10169
Support opt model by @wenhuipeng in https://github.com/sgl-project/sglang/pull/10165
feat: use sgl-kernel cu129 as default by @zhyncs in https://github.com/sgl-project/sglang/pull/10188
[Refactor] Remove Hicache Load & Write threads by @DarkSharpness in https://github.com/sgl-project/sglang/pull/10127
Explictly export CMAKE_BUILD_PARALLEL_LEVEL by @key4ng in https://github.com/sgl-project/sglang/pull/10193
[CPU] Add gelu_and_mul kernel in sgl-kernel and add ut by @blzheng in https://github.com/sgl-project/sglang/pull/9300
feat: support fa cute in sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/10205
Refactor fused_add_rmsnorm import logic by @ShangmingCai in https://github.com/sgl-project/sglang/pull/10207
tool-call(dsv3): Fixed a parse problem when there are multiple function definitions in tool_calls by @Missmiaom in https://github.com/sgl-project/sglang/pull/10209
[Auto Sync] Update sampling_batch_info.py (20250909) by @merrymercy in https://github.com/sgl-project/sglang/pull/10212
chore: bump v0.3.9 sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/10208
add variable TP Decode > Prefill size support by @shaharmor98 in https://github.com/sgl-project/sglang/pull/9960
[Fix] KV-cache eviction mismatch across PP ranks in DeepSeek V3/R1 by @qhsc in https://github.com/sgl-project/sglang/pull/10214
chore: upgrade v0.3.9 sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/10220
Revert the changes on NCCL symmetric memory by @merrymercy in https://github.com/sgl-project/sglang/pull/10210
Revert "Revert the changes on NCCL symmetric memory" by @merrymercy in https://github.com/sgl-project/sglang/pull/10238
[HiCache] feat: add mooncake backend extra config by @stmatengss in https://github.com/sgl-project/sglang/pull/10213
Add mamba kernel by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10234
[Auto Sync] Update io_struct.py (20250909) by @merrymercy in https://github.com/sgl-project/sglang/pull/10236
[Auto Sync] Update collector.py, startup_func_log_and_timer... (20250910) by @merrymercy in https://github.com/sgl-project/sglang/pull/10242
Revert "chore: upgrade v0.3.9 sgl-kernel" by @merrymercy in https://github.com/sgl-project/sglang/pull/10245
refactor(InternVL): Use gpu to preprocess the input image by @KEVINTUAN12 in https://github.com/sgl-project/sglang/pull/9795
make --speculative-draft-model an alias of --speculative-draft-model-path by @merrymercy in https://github.com/sgl-project/sglang/pull/10246
[UT for RL] Add UT to cover release/resume memory case for moe model by @ryang-max in https://github.com/sgl-project/sglang/pull/8803
[Benchmark] Prefil-only benchmark scripts by @sundar24295s in https://github.com/sgl-project/sglang/pull/10240
[doc] add walkthrough for implementing and hosting a simple llama wrapper m… by @glenliu21 in https://github.com/sgl-project/sglang/pull/10093
Fix: the default choice is wrong for flashinfer mxfp4 moe precision by @LauYeeYu in https://github.com/sgl-project/sglang/pull/10253
Page first direct IO kernel by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/10060
support vlm model spec bench by @Lzhang-hub in https://github.com/sgl-project/sglang/pull/10173
Fix assertion typo in tp_worker.py by @sgncho in https://github.com/sgl-project/sglang/pull/9954
[Auto Sync] Update io_struct.py (20250910) by @merrymercy in https://github.com/sgl-project/sglang/pull/10262
Fix potential flakiness in test_lora_qwen3 by @lifuhuang in https://github.com/sgl-project/sglang/pull/10250
[router][ci] Add PD router mmlu test by @key4ng in https://github.com/sgl-project/sglang/pull/10256
[1/2] Refactor LoRA to support backend-specific batch preprocessing. by @lifuhuang in https://github.com/sgl-project/sglang/pull/10251
[Bugfix] Fix Weightloading for the original nvidia/Deepseek-R1-FP4 checkpoint by @pavanimajety in https://github.com/sgl-project/sglang/pull/9940
add dual stream for qwen2_moe by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10252
Add tests to AMD CI for MI35x by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/9662
pass a_scale from fp8 quant result instead of hard code to 1.0f by @rainj-me in https://github.com/sgl-project/sglang/pull/10241
Feat: support disable tool parser by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/10184
[Auto Sync] Update serving_base.py, serving_chat.py, servin... (20250910) by @merrymercy in https://github.com/sgl-project/sglang/pull/10282
Revert "[1/2] Optimizations and refactors about quant kernel (#9534)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10292
chore: bump sgl-kernel 0.3.9.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/10294
[Feature] Support DeepEP normal & Redundant Experts on NPU by @iforgetmyname in https://github.com/sgl-project/sglang/pull/9881
add flash linear attention triton kernel by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10239
[chore]Add sgl-router to npu images by @BourneSun0527 in https://github.com/sgl-project/sglang/pull/10229
[CPU] fix OOM when mem-fraction is not set by @ZailiWang in https://github.com/sgl-project/sglang/pull/9090
[fix CI] Fix logical condition in fused MoE layer for compressed tensor quantization by @BBuf in https://github.com/sgl-project/sglang/pull/10299
Revert "Fix flashinfer version in sgl-kernel (#10135)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10310
chore: bump sgl-kernel 0.3.9.post2 by @zhyncs in https://github.com/sgl-project/sglang/pull/10311
[CI] add pyproject.toml to deepseek w4a8 ci by @HanHan009527 in https://github.com/sgl-project/sglang/pull/10314
chore: upgrade v0.3.9.post2 sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/10297
Qwen3-Next support by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10233
[Auto Sync] Update parallel_state.py (20250911) by @merrymercy in https://github.com/sgl-project/sglang/pull/10326
[Minor] Improve the style of server args by @merrymercy in https://github.com/sgl-project/sglang/pull/10328
[bugfix] fix norm type error in qwen3_next model by @cao1zhg in https://github.com/sgl-project/sglang/pull/10322
[Qwen3-Next] switch to triton and cache conv states to accelerate MTP from 300 tok/s to 341 tok/s by @hebiao064 in https://github.com/sgl-project/sglang/pull/10335
[router] add benchmark for regular router and pd router by @key4ng in https://github.com/sgl-project/sglang/pull/10280
add h20 qwen3 next config by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10264
[router] Add OpenAI backend support - core function by @key4ng in https://github.com/sgl-project/sglang/pull/10254
[router][ci] add gpu process check and free port before start server by @key4ng in https://github.com/sgl-project/sglang/pull/10338
add qwen3-next doc by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10327
fix: trtllm-gen attention take zero-init workspace by @yyihuang in https://github.com/sgl-project/sglang/pull/10330
Fix errors of hicache kernels in sgl-kernel for ROCm by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/10339
update GLM nightly test threshold by @zminglei in https://github.com/sgl-project/sglang/pull/10331
[LongCat] Optimize zero_experts_compute_triton by changing mask by @zk-lover in https://github.com/sgl-project/sglang/pull/10303
add try catch for quant config hf download by @gongwei-130 in https://github.com/sgl-project/sglang/pull/10340
chore: bump v0.5.2 by @zhyncs in https://github.com/sgl-project/sglang/pull/10221

New Contributors

@Beichen-Ma made their first contribution in https://github.com/sgl-project/sglang/pull/9429
@SCDESPERTATE made their first contribution in https://github.com/sgl-project/sglang/pull/7317
@CiaranZhou made their first contribution in https://github.com/sgl-project/sglang/pull/9229
@jonaslsaa made their first contribution in https://github.com/sgl-project/sglang/pull/9190
@ykwd made their first contribution in https://github.com/sgl-project/sglang/pull/8901
@ZhengdQin made their first contribution in https://github.com/sgl-project/sglang/pull/8328
@lshmouse made their first contribution in https://github.com/sgl-project/sglang/pull/9630
@GavinZhu-GMI made their first contribution in https://github.com/sgl-project/sglang/pull/9635
@cicirori made their first contribution in https://github.com/sgl-project/sglang/pull/9665
@KEVINTUAN12 made their first contribution in https://github.com/sgl-project/sglang/pull/9597
@rainj-me made their first contribution in https://github.com/sgl-project/sglang/pull/9495
@pabloiyu made their first contribution in https://github.com/sgl-project/sglang/pull/9397
@KerwinKai made their first contribution in https://github.com/sgl-project/sglang/pull/9216
@mmangkad made their first contribution in https://github.com/sgl-project/sglang/pull/9802
@Orchard-DT made their first contribution in https://github.com/sgl-project/sglang/pull/9824
@pbkowalski made their first contribution in https://github.com/sgl-project/sglang/pull/9073
@LukasBluebaum made their first contribution in https://github.com/sgl-project/sglang/pull/9803
@chenxijun1029 made their first contribution in https://github.com/sgl-project/sglang/pull/8118
@tc-mb made their first contribution in https://github.com/sgl-project/sglang/pull/8747
@alhridoy made their first contribution in https://github.com/sgl-project/sglang/pull/9946
@xiaguan made their first contribution in https://github.com/sgl-project/sglang/pull/9927
@WangJianQ-0118 made their first contribution in https://github.com/sgl-project/sglang/pull/9895
@jingyu-ml made their first contribution in https://github.com/sgl-project/sglang/pull/7912
@fangjian601 made their first contribution in https://github.com/sgl-project/sglang/pull/9906
@SzymonOzog made their first contribution in https://github.com/sgl-project/sglang/pull/9978
@gracehonv made their first contribution in https://github.com/sgl-project/sglang/pull/9314
@JamesLim-sy made their first contribution in https://github.com/sgl-project/sglang/pull/9934
@DevashishLal-CB made their first contribution in https://github.com/sgl-project/sglang/pull/5255
@MahmoudAshraf97 made their first contribution in https://github.com/sgl-project/sglang/pull/8622
@sdpkjc made their first contribution in https://github.com/sgl-project/sglang/pull/9884
@shadowpa0327 made their first contribution in https://github.com/sgl-project/sglang/pull/6905
@jinyangyuan-nvidia made their first contribution in https://github.com/sgl-project/sglang/pull/9834
@Oasis-Git made their first contribution in https://github.com/sgl-project/sglang/pull/9741
@jianyingzhu made their first contribution in https://github.com/sgl-project/sglang/pull/9969
@benbarsdell made their first contribution in https://github.com/sgl-project/sglang/pull/10056
@mss1213 made their first contribution in https://github.com/sgl-project/sglang/pull/9971
@WeiweiZhang1 made their first contribution in https://github.com/sgl-project/sglang/pull/6226
@huaiyuzh made their first contribution in https://github.com/sgl-project/sglang/pull/9434
@ssshinigami made their first contribution in https://github.com/sgl-project/sglang/pull/9871
@alanhe151220037 made their first contribution in https://github.com/sgl-project/sglang/pull/9925
@Zhiy-Zhang made their first contribution in https://github.com/sgl-project/sglang/pull/10166
@gerayking made their first contribution in https://github.com/sgl-project/sglang/pull/10191
@wenhuipeng made their first contribution in https://github.com/sgl-project/sglang/pull/10165
@Missmiaom made their first contribution in https://github.com/sgl-project/sglang/pull/10209
@shaharmor98 made their first contribution in https://github.com/sgl-project/sglang/pull/9960
@qhsc made their first contribution in https://github.com/sgl-project/sglang/pull/10214
@glenliu21 made their first contribution in https://github.com/sgl-project/sglang/pull/10093
@LauYeeYu made their first contribution in https://github.com/sgl-project/sglang/pull/10253
@sgncho made their first contribution in https://github.com/sgl-project/sglang/pull/9954
@BourneSun0527 made their first contribution in https://github.com/sgl-project/sglang/pull/10229
@zk-lover made their first contribution in https://github.com/sgl-project/sglang/pull/10303

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.5.1...v0.5.2

Source: README.md, updated 2025-09-11

SGLang Files

SGLang is a fast serving framework for large language models

What's Changed

New Contributors

SGLang Files

SGLang is a fast serving framework for large language models

Get an email when there's a new version of SGLang

What's Changed

New Contributors