| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2025-09-11 | 49.4 kB | |
| Release v0.5.2 source code.tar.gz | 2025-09-11 | 5.4 MB | |
| Release v0.5.2 source code.zip | 2025-09-11 | 6.7 MB | |
| Totals: 3 Items | 12.2 MB | 0 | |
What's Changed
- feat: allow use local branch to build image by @gongwei-130 in https://github.com/sgl-project/sglang/pull/9546
- [readme] Include additional resources for the SGLang x AMD SF Meetup event by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/9547
- [doc] deepseekv31 support by @XiaotongJiang in https://github.com/sgl-project/sglang/pull/9544
- fix(grok): remove duplicate replicate_lm_head configuration by @vincentzed in https://github.com/sgl-project/sglang/pull/9549
- chore: update configurer by @zhyncs in https://github.com/sgl-project/sglang/pull/9557
- chore: bump v0.5.1.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9558
- [router] add right rustls dependency in sgl-router cargo.toml by @Bruce-x-1997 in https://github.com/sgl-project/sglang/pull/9498
- fix: use sgl-kernel 0.3.5 by @zhyncs in https://github.com/sgl-project/sglang/pull/9565
- Add target module validation for init adapters by @Beichen-Ma in https://github.com/sgl-project/sglang/pull/9429
- fix: Update OpenAI client base URL in documentation by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9576
- [PD] Improve disaggregation metrics output: update the metrics to keep reflecting real stats by @SCDESPERTATE in https://github.com/sgl-project/sglang/pull/7317
- remove redundant rank0_log function. by @miter6 in https://github.com/sgl-project/sglang/pull/9560
- Update CUTLASS 4.2 & Enable K-Major Scale Factor for SM90 FP8 Blockwise Group GEMM by @HydraQYH in https://github.com/sgl-project/sglang/pull/9559
- Reintroduce memory usage fix by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9535
- Offload tensors by sharding on GPU by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9536
- bugfix for undefined logging functions in HarmonyBrowserTool & HarmonyPythonTool by @CiaranZhou in https://github.com/sgl-project/sglang/pull/9229
- chore: upgrade flashinfer 0.2.14.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9578
- fix: revert [#8593] by @zhyncs in https://github.com/sgl-project/sglang/pull/9581
- fix: resolve tuning fused moe issue by @zhyncs in https://github.com/sgl-project/sglang/pull/9587
- Tiny fix wrong comments by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9589
- chore: update config by @zhyncs in https://github.com/sgl-project/sglang/pull/9591
- chore: bump v0.5.1.post2 by @zhyncs in https://github.com/sgl-project/sglang/pull/9592
- [Doc] add LWS(LeaderWorkerSet) use case in sgl-router README by @Bruce-x-1997 in https://github.com/sgl-project/sglang/pull/9568
- [Performance] Batch Send from Tokenizer Manager. by @sundar24295s in https://github.com/sgl-project/sglang/pull/9436
- Fix GLM45 tool call multi-turn bug by @byjiang1996 in https://github.com/sgl-project/sglang/pull/9500
- Fix GLM45v launch server cuda torch compile bug by @byjiang1996 in https://github.com/sgl-project/sglang/pull/9554
- Fix Harmony reasoning parser for and auto-separation for gpt-oss models by @jonaslsaa in https://github.com/sgl-project/sglang/pull/9190
- [docs] Refactor, remove compiled results and add gpt-oss by @zhaochenyang20 in https://github.com/sgl-project/sglang/pull/9613
- [Fix] HiCache Bugfix & Mooncake Error Handling Enhance by @ykwd in https://github.com/sgl-project/sglang/pull/8901
- Improve bench_one_batch_server script by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9608
- [router] add mistral tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9622
- [router] add qwen tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9623
- [router] add pythonic parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9628
- [router] add llama tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9629
- [router] add ut for mistral, llama, pythonic, and streaming tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9632
- [new feat] ascend backend support fia fusion kernel by @ZhengdQin in https://github.com/sgl-project/sglang/pull/8328
- model: Support nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 by @netanel-haber in https://github.com/sgl-project/sglang/pull/9301
- Fix lint for router by @hebiao064 in https://github.com/sgl-project/sglang/pull/9636
- [docs] Update README with additional highlights and resources for SGLang x AMD SF Meetup by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/9640
- Add reasoning_effort param in TiktokenTokenizer.apply_chat_template by @lshmouse in https://github.com/sgl-project/sglang/pull/9630
- fix: allow user to specify function as role by @GavinZhu-GMI in https://github.com/sgl-project/sglang/pull/9635
- Fix kimi k2 function calling format by @XiaotongJiang in https://github.com/sgl-project/sglang/pull/9606
- [router] address worker load tracking consistency by @slin1237 in https://github.com/sgl-project/sglang/pull/9523
- [router] add token bucket rate limiter by @CatherineSue in https://github.com/sgl-project/sglang/pull/9656
- [doc] add kimik2 --tool-call-parser by @XiaotongJiang in https://github.com/sgl-project/sglang/pull/9647
- Install py-spy by default for containers for easier debugging by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9649
- BugFix(hicache): Fix host indices out of bound error by @hzh0425 in https://github.com/sgl-project/sglang/pull/9637
- HiCache Storage fix host memory leak by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9648
- add
response_formatsupport forcompletionAPI by @cicirori in https://github.com/sgl-project/sglang/pull/9665 - Fix FA3 swa spec verify topk>1 by @ispobock in https://github.com/sgl-project/sglang/pull/9658
- [RL] fix register the same ops multiple times by @hebiao064 in https://github.com/sgl-project/sglang/pull/9564
- chore: enhance bench_serving for vlms with a new dataset of configurable image count and resolution by @mickqian in https://github.com/sgl-project/sglang/pull/9583
- refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management by @hzh0425 in https://github.com/sgl-project/sglang/pull/9555
- feat: (chat-template matching) enhance multimodal model detection with config.json by @KEVINTUAN12 in https://github.com/sgl-project/sglang/pull/9597
- [docs] Instructions for bench_serving.py by @yhyang201 in https://github.com/sgl-project/sglang/pull/9071
- Support DeepSeek-V3.1 tool call by @Xu-Wenqing in https://github.com/sgl-project/sglang/pull/9446
- Add A100 fused MoE kernel configs for Dpsk by @ehuaa in https://github.com/sgl-project/sglang/pull/9677
- support cuda 13.0 and trtllm kernel by @rainj-me in https://github.com/sgl-project/sglang/pull/9495
- fix: HiRadixCache: fix prefetch completion race by @pabloiyu in https://github.com/sgl-project/sglang/pull/9397
- fix mooncake store mla zero copy meta by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/9678
- move is_sm90_supported/is_sm100_supported to python/sglang/srt/utils.py by @merrymercy in https://github.com/sgl-project/sglang/pull/9679
- [router] restructure tool parser module folder by @slin1237 in https://github.com/sgl-project/sglang/pull/9693
- [router] add deepseek tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9694
- Quick fix for loading processor for supporting internvl3_5 series by @yilian49 in https://github.com/sgl-project/sglang/pull/9676
- Fix get_ip when no external network by @whybeyoung in https://github.com/sgl-project/sglang/pull/9700
- Sets default model name in request classes by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9683
- [router] add step3 tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9695
- [router] add kimi-k2 tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9702
- [router] add gpt-oss and glm4 tool parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9703
- [sgl-kernel] misc: update deepgemm version for sgl-kernel by @FlamingoPg in https://github.com/sgl-project/sglang/pull/9340
- chore: upgrade sgl-kernel 0.3.7 by @zhyncs in https://github.com/sgl-project/sglang/pull/9708
- chore: bump v0.5.1.post3 by @zhyncs in https://github.com/sgl-project/sglang/pull/9716
- [router] upgrade kernel version in pd ci by @CatherineSue in https://github.com/sgl-project/sglang/pull/9720
- [Sync] Update mxfp4.py (20250827) by @merrymercy in https://github.com/sgl-project/sglang/pull/9724
- [router] fix error response in pd_router by @Bruce-x-1997 in https://github.com/sgl-project/sglang/pull/9505
- [router] Add MCP Tool Handler by @key4ng in https://github.com/sgl-project/sglang/pull/9615
- gpt-oss blog reproduction document by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9728
- [router] additional pythonic parser unit test by @slin1237 in https://github.com/sgl-project/sglang/pull/9730
- [router] additional llama32 parser unit test and multi json support by @slin1237 in https://github.com/sgl-project/sglang/pull/9732
- support mooncake store dp attention by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/9684
- add support for nvidia/gpt-oss-120b-Eagle3 by @zyksir in https://github.com/sgl-project/sglang/pull/9739
- Move git clone command up from README by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9740
- [feat] Reduce GPU memory overhead by using weakref by @yhyang201 in https://github.com/sgl-project/sglang/pull/9673
- Support speculative decoding in hybrid attention backend by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/9573
- [router] add llama3.2 multi json streaming parser by @slin1237 in https://github.com/sgl-project/sglang/pull/9735
- Support compile sgl-kernel on cuda 13.0 by @rainj-me in https://github.com/sgl-project/sglang/pull/9721
- [Sync] Update server_args.py (20250828) by @merrymercy in https://github.com/sgl-project/sglang/pull/9745
- [router] grpc router bootstraps by @slin1237 in https://github.com/sgl-project/sglang/pull/9759
- [AMD] Support Hierarchical Caching on AMD GPUs by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/8236
- feat: add tuned fused moe config for GLM-4.5-Air-FP8 tp = 4 on B200 by @zixuanzhang226 in https://github.com/sgl-project/sglang/pull/9770
- [Feature] Support NPUGraph for DeepSeek on Ascend NPU by @chenxu140 in https://github.com/sgl-project/sglang/pull/9355
- feat(draft_model): support draft_model for RemoteModelLoader by @DellCurry in https://github.com/sgl-project/sglang/pull/6407
- fix: fix MLA for ShardedModelLoader/RemoteModelLoader by @DellCurry in https://github.com/sgl-project/sglang/pull/6287
- Optimize prefill performance on cpu backend by @mingfeima in https://github.com/sgl-project/sglang/pull/8750
- [HiCache] change the default policy to write through by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9772
- bugfix(hicache): Move exists check before key suffixing by @hzh0425 in https://github.com/sgl-project/sglang/pull/9749
- Skip some tests on Blackwell by @hlu1 in https://github.com/sgl-project/sglang/pull/9777
- Raise error when
topk>1andpage>1for paged attention backends. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9784 - ROCm 7.0 update by @sogalin in https://github.com/sgl-project/sglang/pull/9757
- add bench_mix.py by @pansicheng in https://github.com/sgl-project/sglang/pull/9788
- Make sm100 fp8 kernels available on sm103 by @hlu1 in https://github.com/sgl-project/sglang/pull/9789
- accomendate json schema in the "schema" field, not in "json_schema" field of response_format by @gongwei-130 in https://github.com/sgl-project/sglang/pull/9786
- [PD] Support get_model_info interface for mini_lb by @XucSh in https://github.com/sgl-project/sglang/pull/9792
- [HiCache] resolve conflict between chunked-prefill and hicache hit count by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9776
- feat(hicache-3fs): 3FS-Store Backup Optimizations For MLA Model. by @hzh0425 in https://github.com/sgl-project/sglang/pull/9692
- support enable in the reasoning field to enable thingking for thinkin… by @gongwei-130 in https://github.com/sgl-project/sglang/pull/9715
- feat: Add flexible validation for partial weight updates by @GeLee-Q in https://github.com/sgl-project/sglang/pull/9663
- feat: add original logprobs to response by @narutolhy in https://github.com/sgl-project/sglang/pull/8375
- [feat] Support EAGLE3 for Qwen2 by @KerwinKai in https://github.com/sgl-project/sglang/pull/9216
- chore: upgrade flashinfer 0.3.0rc1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9793
- [ModelOpt] Fix Weight Loading for DSR1-FP4 Quantization by @pavanimajety in https://github.com/sgl-project/sglang/pull/9712
- Fix TRTLLM MLA Cuda KV Blocks Causing accuracy drop by @farazkh80 in https://github.com/sgl-project/sglang/pull/9675
- [NVIDIA] [2/N] Optimize
silu_and_mul_scaled_fp4_grouped_quantperf by @kaixih in https://github.com/sgl-project/sglang/pull/9556 - Adds initialize_moe_config to bench_one_batch so MOE backend is respected by @pranavm-nvidia in https://github.com/sgl-project/sglang/pull/9670
- Small bug fix in transformers model implementation by @yilian49 in https://github.com/sgl-project/sglang/pull/9809
- feature(eplb): add min-rebalancing-utilization-threshold for eplb by @hzh0425 in https://github.com/sgl-project/sglang/pull/8345
- Make fp4_quantize kernels work on sm103 by @hlu1 in https://github.com/sgl-project/sglang/pull/9807
- fix: dsv3 lite q_lora_rank none by @zhyncs in https://github.com/sgl-project/sglang/pull/9815
- Fix memory leak when aborting decode request in PD-Disagg by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9817
- chore: fix cuda driver api issue and bump sgl-kernel 0.3.7.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9746
- chore: update Dockerfile by @zhyncs in https://github.com/sgl-project/sglang/pull/9820
- Fix typo in warning message about DeepGEMM JIT by @mmangkad in https://github.com/sgl-project/sglang/pull/9802
- chore: upgrade sgl-kernel 0.3.7.post1 with deepgemm fix by @zhyncs in https://github.com/sgl-project/sglang/pull/9822
- [sgl-kernel] fix: fix missing FetchContent_Populate for fmt by @FlamingoPg in https://github.com/sgl-project/sglang/pull/9826
- chore: upgrade transformers 4.56.0 by @zhyncs in https://github.com/sgl-project/sglang/pull/9827
- [Auto Sync] Update parallel_state.py (20250830) by @merrymercy in https://github.com/sgl-project/sglang/pull/9828
- [CI] Fix the trigger condition for PR test workflows by @merrymercy in https://github.com/sgl-project/sglang/pull/9761
- [CI] Code sync tools by @merrymercy in https://github.com/sgl-project/sglang/pull/9830
- Update guidelines for syncing code between repos by @merrymercy in https://github.com/sgl-project/sglang/pull/9831
- hot fix for mooncake batch set api by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9836
- [router] add reasoning parser readme by @slin1237 in https://github.com/sgl-project/sglang/pull/9837
- Tool parser.benchmark by @CatherineSue in https://github.com/sgl-project/sglang/pull/9835
- [Model] Support Meituan LongCat-Flash && LongCat-Flash-MTP by @Orchard-DT in https://github.com/sgl-project/sglang/pull/9824
- [router] global tool parser registry by @CatherineSue in https://github.com/sgl-project/sglang/pull/9840
- [feat]Ascend NPU Gemma-3-12b and Gemma-3-27b support by @VDV1985 in https://github.com/sgl-project/sglang/pull/8909
- [Performance] Improve Qwen RMSNorm by replacing with native RMSNorm op by @vincentzed in https://github.com/sgl-project/sglang/pull/9709
- [HiCache] Clear kvcache in storage backend with fastAPI by @stmatengss in https://github.com/sgl-project/sglang/pull/9750
- Fix input logprob index for a batch that includes both requests with input logprob and requests with input logprob. by @merrymercy in https://github.com/sgl-project/sglang/pull/9841
- Fuse gate_proj and up_proj in Qwen 2.5 VL's vision MLP by @AlienKevin in https://github.com/sgl-project/sglang/pull/9661
- [HiCache] Storage Refactoring by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9797
- fix
set_interal_stateAPI by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9850 - fix inconsistent arguments for generated shared prefix bench by @pbkowalski in https://github.com/sgl-project/sglang/pull/9073
- fix(hicahce-long-bench): adjust context workload generator to use full query set by @hzh0425 in https://github.com/sgl-project/sglang/pull/9847
- Disable radix cache in test_lora_update.py for better stability by @Fridge003 in https://github.com/sgl-project/sglang/pull/9852
- Tiny allow DeepGEMM on cu12.9 by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9858
- Update docker build workflows for gfx942 ROCm 7.0. by @saienduri in https://github.com/sgl-project/sglang/pull/9794
- Support Multi Process Tokenizer Manager(#6555) by @whybeyoung in https://github.com/sgl-project/sglang/pull/8964
- chore: upgrade flashinfer 0.3.0 by @zhyncs in https://github.com/sgl-project/sglang/pull/9864
- chore: bump v0.5.2rc0 by @zhyncs in https://github.com/sgl-project/sglang/pull/9862
- Mooncake store get zero copy meta optimization by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/9857
- [router] add tokenizer download support from hf hub by @CatherineSue in https://github.com/sgl-project/sglang/pull/9882
- support fp8 kvcache for hybrid attn backend on GPT-OSS by @rainj-me in https://github.com/sgl-project/sglang/pull/9783
- [HiCacheStorage] fix abort request host memory leaks by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/9874
- [HiCacheStorage]: Improve 3fs kvstore‘s performance and resolve mla issues by @hzh0425 in https://github.com/sgl-project/sglang/pull/9876
- [router] Fix short timeout for the prefill client by @LukasBluebaum in https://github.com/sgl-project/sglang/pull/9803
- [code style] restruct fused_moe to avoid very long single file by @BBuf in https://github.com/sgl-project/sglang/pull/9878
- [router] add grpc pd and regular router init by @CatherineSue in https://github.com/sgl-project/sglang/pull/9893
- [router] fix FunctionCallResponse proto, support arguments is null by @Bruce-x-1997 in https://github.com/sgl-project/sglang/pull/9875
- [feat] Support tp mode for DeepSeek-R1-W4AFP8 by @chenxijun1029 in https://github.com/sgl-project/sglang/pull/8118
- Move multi-tokenizer event loop to better place by @ShangmingCai in https://github.com/sgl-project/sglang/pull/9902
- [chore] fix dead links in doc by @lifuhuang in https://github.com/sgl-project/sglang/pull/9913
- Change tensor alignment method to mn major by @mmangkad in https://github.com/sgl-project/sglang/pull/9844
- chore: bump v0.3.8 sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/9907
- [Fix] fix the issue encountered when inference LongCat-Flash/MTP EP MoE on b200 by @Orchard-DT in https://github.com/sgl-project/sglang/pull/9916
- fix parallel_state.py
current_platformbug by @BBuf in https://github.com/sgl-project/sglang/pull/9919 - [feat] apply deep_gemm compile_mode to skip launch by @Alcanderian in https://github.com/sgl-project/sglang/pull/9879
- fix: update router deps by @zhyncs in https://github.com/sgl-project/sglang/pull/9921
- chore: bump v0.5.2rc1 by @zhyncs in https://github.com/sgl-project/sglang/pull/9920
- [Hicache] Generic page get bugfix by @ykwd in https://github.com/sgl-project/sglang/pull/9909
- Support the internvl3.5 family models in sglang by @yilian49 in https://github.com/sgl-project/sglang/pull/9705
- [router] include rust benchamrks by @slin1237 in https://github.com/sgl-project/sglang/pull/9932
- Fix the key passing issue in page first layout. by @hzh0425 in https://github.com/sgl-project/sglang/pull/9929
- [router] fix grpc client url normalzation and health check by @CatherineSue in https://github.com/sgl-project/sglang/pull/9939
- [model] support MiniCPM-V 4.0 by @tc-mb in https://github.com/sgl-project/sglang/pull/8747
- [HiCache] Minor fix on file storage backend by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9869
- Move parsers under a single folder by @merrymercy in https://github.com/sgl-project/sglang/pull/9912
- [Fix] DeepSeek EP accuracy issue on B200 GPUs by @alhridoy in https://github.com/sgl-project/sglang/pull/9946
- fix(cache): move ongoing_prefetch pop after validation to prevent leak by @xiaguan in https://github.com/sgl-project/sglang/pull/9927
- Remove annoying warnings in sgl kernel build by @merrymercy in https://github.com/sgl-project/sglang/pull/9905
- Update tool_chat_template_deepseekv31.jinja by @WangJianQ-0118 in https://github.com/sgl-project/sglang/pull/9895
- Qwen FP8/NVFP4 ModelOPT Quantization support by @jingyu-ml in https://github.com/sgl-project/sglang/pull/7912
- Optimized deepseek-v3/r1 model performance on mxfp4 run by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/9671
- add proctitle for tokenizers by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9952
- [feat] Add P/D attention select for draft model by @Ximingwang-09 in https://github.com/sgl-project/sglang/pull/9755
- Revert "[Fix] DeepSeek EP accuracy issue on B200 GPUs (#9946)" by @zhyncs in https://github.com/sgl-project/sglang/pull/9955
- Revert "Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671)" by @zhyncs in https://github.com/sgl-project/sglang/pull/9959
- [benchmark] add flashinfer_allreduce_fusion benchmark by @BBuf in https://github.com/sgl-project/sglang/pull/9937
- [1/N][Bug] Fix w4afp8 MoE NaN issue (sgl-kernel) by @yuhyao in https://github.com/sgl-project/sglang/pull/9953
- [router] Add Rerank API Specification by @fangjian601 in https://github.com/sgl-project/sglang/pull/9906
- [router] add chat_template_kwargs in ChatCompletionRequest by @tonyluj in https://github.com/sgl-project/sglang/pull/9958
- Remove mrope position sync by @timmy-feng in https://github.com/sgl-project/sglang/pull/9460
- fix swa clear(): rename is_in_free_group to is_not_in_free_group by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9914
- Triton 3.4.0 MoE config for Deepseek TP16 H100 by @SzymonOzog in https://github.com/sgl-project/sglang/pull/9978
- nsys profile output kernel classifier by @gracehonv in https://github.com/sgl-project/sglang/pull/9314
- Minor update regarding issue [#9704] by @elfiegg in https://github.com/sgl-project/sglang/pull/9733
- [Auto Sync] Update parallel_state.py, few_shot_gsm8k.py (20250903) by @merrymercy in https://github.com/sgl-project/sglang/pull/9986
- feat: add gpt oss b200 ci by @zhyncs in https://github.com/sgl-project/sglang/pull/9988
- [router] move tokenizer, reasoning, tool initialization to server by @slin1237 in https://github.com/sgl-project/sglang/pull/9996
- [router] clean up dependency injector to use ctx by @slin1237 in https://github.com/sgl-project/sglang/pull/10000
- [router] fix grpc connection mode detection by @slin1237 in https://github.com/sgl-project/sglang/pull/9999
- [Fix] gpt-oss mxfp4 model run failed on ROCm platform by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/9994
- Fix Llama 4 with MXFP4 dynamic quant on MI35x by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/9993
- [Bugfix] fix pd chat completion protocol for batching support by @tonyluj in https://github.com/sgl-project/sglang/pull/10016
- fix: health_generate endpoint in mini_lb by @wxsms in https://github.com/sgl-project/sglang/pull/9997
- [1/N] DP-refactor: move dp balance code into scheduler's mixin class by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10004
- Ensure chunked request extension length respects both rem_chunk_tokens and rem_total_tokens limits by @pansicheng in https://github.com/sgl-project/sglang/pull/10003
- feat(hicache): Add generic hicache ci e2e test and benchmark test by @hzh0425 in https://github.com/sgl-project/sglang/pull/9846
- Optimize Qwen3-moe model by using flashinfer fused allreduce by @yuan-luo in https://github.com/sgl-project/sglang/pull/9973
- [Doc] Fix SGLang tool parser doc by @PopSoda2002 in https://github.com/sgl-project/sglang/pull/9886
- metrics: support customer buckets for prompt/generation_tokens_histogram by @acelyc111 in https://github.com/sgl-project/sglang/pull/9634
- fix 3fs zerocopy by @pansicheng in https://github.com/sgl-project/sglang/pull/9938
- Save memory for expert model parallel by @ch-wan in https://github.com/sgl-project/sglang/pull/9957
- [Hicache] Mooncake API Fix & Test, and Improved Readme by @ykwd in https://github.com/sgl-project/sglang/pull/9951
- Optimized deepseek-v3/r1 model performance on mxfp4 run by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/10008
- Fix accuracy drop of dsv3 run in dp enablement by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/8677
- chore: bump v0.5.2rc2 by @zhyncs in https://github.com/sgl-project/sglang/pull/10050
- fix: update gb200 dep by @zhyncs in https://github.com/sgl-project/sglang/pull/10052
- Simplify
Routerarguments passing and build it in docker image by @hnyls2002 in https://github.com/sgl-project/sglang/pull/9964 - [router] fix release workflow to include protobuf by @CatherineSue in https://github.com/sgl-project/sglang/pull/10055
- fix MultiTokenizerWrapper name by @LLLL114 in https://github.com/sgl-project/sglang/pull/10049
- Integrate trtllm ragged attention for prefill self-attention by @elfiegg in https://github.com/sgl-project/sglang/pull/9801
- [Vulnerability]feat(conn): set bootstrap server host by @jinmingyi1998 in https://github.com/sgl-project/sglang/pull/9931
- Fix typo in scheduler by @JamesLim-sy in https://github.com/sgl-project/sglang/pull/9934
- [1/2] Optimizations and refactors about quant kernel by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9534
- Tiny support setting numa nodes for different ranks by @fzyzcjy in https://github.com/sgl-project/sglang/pull/10006
- [Fix] Add speculative_draft_model_revision to server_args by @DevashishLal-CB in https://github.com/sgl-project/sglang/pull/5255
- Forbid DeepEP racing condition when too many tokens by @fzyzcjy in https://github.com/sgl-project/sglang/pull/9567
- Support simple evals in text comparator by @fzyzcjy in https://github.com/sgl-project/sglang/pull/8867
- Fix and enhance dumper by @fzyzcjy in https://github.com/sgl-project/sglang/pull/8725
- Tiny let DeepGEMM scale checks cover more cases by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7182
- Support copying tensor from cpu to gpu without using copy engines by @fzyzcjy in https://github.com/sgl-project/sglang/pull/10007
- [router] add py binding unit tests to coverage 80% by @key4ng in https://github.com/sgl-project/sglang/pull/10043
- [router] add rust cache for rust unit test by @key4ng in https://github.com/sgl-project/sglang/pull/10079
- [router] add rust cache by @slin1237 in https://github.com/sgl-project/sglang/pull/10080
- enable aiter gemm_a8w8_bpreshuffle for ptpc gemm by @Yuechguo in https://github.com/sgl-project/sglang/pull/8555
- [bugfix]: use correct cache location for cross attention in torch native backend by @MahmoudAshraf97 in https://github.com/sgl-project/sglang/pull/8622
- Update flashinfer to 0.3.1 for B300 support by @hlu1 in https://github.com/sgl-project/sglang/pull/10087
- [Bug Fix] Fix Glm4vVisionBlock norm by @sdpkjc in https://github.com/sgl-project/sglang/pull/9884
- Update wave-lang to 3.7.0 and unify Wave kernel buffer options by @yichiche in https://github.com/sgl-project/sglang/pull/10069
- Add storage read/write bandwidth logs to monitor kvcache performance by @pansicheng in https://github.com/sgl-project/sglang/pull/9965
- [Minor] Refactors KV memory pool by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/9842
- support Llama4 with non uniformed intermediate size across layers for… by @gongwei-130 in https://github.com/sgl-project/sglang/pull/10047
- [router] move to mcp sdk instead by @slin1237 in https://github.com/sgl-project/sglang/pull/10057
- [router] Introduce router integration tests by @key4ng in https://github.com/sgl-project/sglang/pull/10086
- Add lora_path argument to bench_multiturn.py by @Fridge003 in https://github.com/sgl-project/sglang/pull/10092
- [HiStorage] Remove delete and clear as necessary methods by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/10039
- Modify ci workflow for auto-partitioning in 2-GPU backend tests by @hzh0425 in https://github.com/sgl-project/sglang/pull/10029
- Revert "[1/N][Bug] Fix w4afp8 MoE NaN issue (sgl-kernel) (#9953)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10097
- Fix RMSNorm API CALL mismatch issue. by @sogalin in https://github.com/sgl-project/sglang/pull/10032
- fix double sparsity initialization by @shadowpa0327 in https://github.com/sgl-project/sglang/pull/6905
- [Fix] illegal sync based on undefined behaviour by @DevashishLal-CB in https://github.com/sgl-project/sglang/pull/9620
- [7/N] MoE Refactor: the implementation of new framework by @ch-wan in https://github.com/sgl-project/sglang/pull/9269
- [NVIDIA] Remove unused
get_fused_moe_impl_classfunction by @kaixih in https://github.com/sgl-project/sglang/pull/9764 - [NVIDIA] disable chunked prefix cache when dp is used by @kaixih in https://github.com/sgl-project/sglang/pull/9861
- perf: Avoid unnecessary data type conversions for DeepSeek-V3 on Blackwell by @jinyangyuan-nvidia in https://github.com/sgl-project/sglang/pull/9834
- [Fix] Compatibility between DP attention and pipeline parallelism by @ch-wan in https://github.com/sgl-project/sglang/pull/10100
- Fix circular import by @ch-wan in https://github.com/sgl-project/sglang/pull/10107
- Disable kernel cutlass_mla_decode on SM103 by @hlu1 in https://github.com/sgl-project/sglang/pull/10058
- Remove non-accelerated targets(100 and up) from cmake by @hlu1 in https://github.com/sgl-project/sglang/pull/10041
- [chore] Remove unused ep_moe cuda kernels by @hlu1 in https://github.com/sgl-project/sglang/pull/9956
- [CI] Refactor disaggregation tests by @ShangmingCai in https://github.com/sgl-project/sglang/pull/10068
- increase the rust e2e timeout by @key4ng in https://github.com/sgl-project/sglang/pull/10116
- [router] Improve the e2e tests by @key4ng in https://github.com/sgl-project/sglang/pull/10102
- [Auto Sync] Update server_args.py (20250906) by @merrymercy in https://github.com/sgl-project/sglang/pull/10117
- Optimize moe_sum_reduce_kernel by @yuan-luo in https://github.com/sgl-project/sglang/pull/9477
- [Feature] LMCache Connector Integration by @Oasis-Git in https://github.com/sgl-project/sglang/pull/9741
- CUTLASS fp8 blockwise gemm support of sm120 by @jianyingzhu in https://github.com/sgl-project/sglang/pull/9969
- Optimize nvfp4 block scaled gemm kernel when M is small. by @HydraQYH in https://github.com/sgl-project/sglang/pull/10101
- Fix cuda graph mode in flashinfer attn backend by @benbarsdell in https://github.com/sgl-project/sglang/pull/10056
- [HiCache] fix: check clear() method for storage backend by @stmatengss in https://github.com/sgl-project/sglang/pull/10096
- add dataset_path for bench_one_batch_server.py by @miter6 in https://github.com/sgl-project/sglang/pull/10113
- [Auto Sync] Update parallel_state.py (20250907) by @merrymercy in https://github.com/sgl-project/sglang/pull/10126
- [Minor] fix lint in main by @DarkSharpness in https://github.com/sgl-project/sglang/pull/10128
- [1/2] Refactor multi-tokenizer manager by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10074
- Fix flashinfer version in sgl-kernel by @merrymercy in https://github.com/sgl-project/sglang/pull/10135
- [DOC]: some minor updates by @yyihuang in https://github.com/sgl-project/sglang/pull/10134
- [BUG FIX] add fail check when get fail in case wait complete block by @mss1213 in https://github.com/sgl-project/sglang/pull/9971
- [MoE] fix: incorrect weight initialization for cutlass_fused_experts_fp8 by @ch-wan in https://github.com/sgl-project/sglang/pull/10144
- Enables GLM4.1V server testing & fix video processing by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/10095
- Fix slow fused add RMSNorm by @fzyzcjy in https://github.com/sgl-project/sglang/pull/10141
- fix the fp8 topk_config.correction_bias is none bug by @rainj-me in https://github.com/sgl-project/sglang/pull/10040
- Qwen2.5-VL eagle3 infer by @Lzhang-hub in https://github.com/sgl-project/sglang/pull/8801
- Fix run time error in dsv3-fp8 model on mi35x by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/10104
- Standalone speculative decoding by @Qiaolin-Yu in https://github.com/sgl-project/sglang/pull/10090
- Add graph runner support with torch compile on CPU by @CaoE in https://github.com/sgl-project/sglang/pull/7843
- move compile threads to an option to avoid OOM on low memory host by @rainj-me in https://github.com/sgl-project/sglang/pull/10123
- [1/N][Bug] Fix w4afp8 MoE NaN issue (sgl-kernel, fixed) by @yuhyao in https://github.com/sgl-project/sglang/pull/10108
- [Bugfix] Retract not releasing enough memory when page size > 1 by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/9989
- Add speculator attention backend switch by @cicirori in https://github.com/sgl-project/sglang/pull/9981
- Fix: (glm4v) Add missing field by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/10147
- [Bugfix] Qwen3MoE aclrtMemcpy failed with NPUGraph by @iforgetmyname in https://github.com/sgl-project/sglang/pull/10013
- enable auto-round quantization model by @WeiweiZhang1 in https://github.com/sgl-project/sglang/pull/6226
- Revert "enable auto-round quantization model (#6226)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10148
- enable llama3.1-8B on xpu by @huaiyuzh in https://github.com/sgl-project/sglang/pull/9434
- [Bug fix] Fix Gemma 2 and fix Gemma 3 multimodal with bs > 1 on NPU by @ssshinigami in https://github.com/sgl-project/sglang/pull/9871
- update xgrammar 0.1.24 and transformers 4.56.1 by @Swipe4057 in https://github.com/sgl-project/sglang/pull/10155
- [2/N] DP-Refactor: move communicators into
tokenizer_communicator_mixinby @hnyls2002 in https://github.com/sgl-project/sglang/pull/10028 - [Hicache]: Add E2E CI For 3FS-KVStore by @hzh0425 in https://github.com/sgl-project/sglang/pull/10131
- Monkey patch uvicorn multi worker
is_alivetimeout by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10159 - [CI] fix ambiguous argument in testing hybrid attentions. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10161
- [1/2] Speed up prefill mla attention by @fzyzcjy in https://github.com/sgl-project/sglang/pull/10156
- [Bug fix] Fix ascend mla in aclgraph by @alanhe151220037 in https://github.com/sgl-project/sglang/pull/9925
- pref: Add H20 fp8 fused MoE kernel configs for Qwen3 by @Zhiy-Zhang in https://github.com/sgl-project/sglang/pull/10166
- [fix] Relax white space rules in EBNFComposer by @LukasBluebaum in https://github.com/sgl-project/sglang/pull/9595
- Revert "[ModelOpt] Fix Weight Loading for DSR1-FP4 Quantization (#9712)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10176
- [Bench] feat: mooncake trace integration by @stmatengss in https://github.com/sgl-project/sglang/pull/9839
- fix: resolve lint issue by @zhyncs in https://github.com/sgl-project/sglang/pull/10181
- fix the cutlass moe tests by @rainj-me in https://github.com/sgl-project/sglang/pull/10182
- gb200: update dockerfile to latest kernel by @ishandhanani in https://github.com/sgl-project/sglang/pull/9522
- Cleaning codes for speculative attention mode by @Fridge003 in https://github.com/sgl-project/sglang/pull/10149
- Revert "feat: add fused moe config for Qwen3-30B-A3B on B200" by @rainj-me in https://github.com/sgl-project/sglang/pull/10185
- [Fix] Orphan process in data parallel by @Capronir in https://github.com/sgl-project/sglang/pull/7995
- Update link for EAGLE speculative decoding by @gerayking in https://github.com/sgl-project/sglang/pull/10191
- [CPU] Fix phi4-mm prompt issue in bench_serving by @blzheng in https://github.com/sgl-project/sglang/pull/9900
- Updated Nvidia Jetson docs by @shahizat in https://github.com/sgl-project/sglang/pull/4422
- [3/N]DP refactor: Improve dp rank scheduling in PD disaggregation mode. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/10169
- Support opt model by @wenhuipeng in https://github.com/sgl-project/sglang/pull/10165
- feat: use sgl-kernel cu129 as default by @zhyncs in https://github.com/sgl-project/sglang/pull/10188
- [Refactor] Remove Hicache Load & Write threads by @DarkSharpness in https://github.com/sgl-project/sglang/pull/10127
- Explictly export CMAKE_BUILD_PARALLEL_LEVEL by @key4ng in https://github.com/sgl-project/sglang/pull/10193
- [CPU] Add gelu_and_mul kernel in sgl-kernel and add ut by @blzheng in https://github.com/sgl-project/sglang/pull/9300
- feat: support fa cute in sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/10205
- Refactor fused_add_rmsnorm import logic by @ShangmingCai in https://github.com/sgl-project/sglang/pull/10207
- tool-call(dsv3): Fixed a parse problem when there are multiple function definitions in tool_calls by @Missmiaom in https://github.com/sgl-project/sglang/pull/10209
- [Auto Sync] Update sampling_batch_info.py (20250909) by @merrymercy in https://github.com/sgl-project/sglang/pull/10212
- chore: bump v0.3.9 sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/10208
- add variable TP Decode > Prefill size support by @shaharmor98 in https://github.com/sgl-project/sglang/pull/9960
- [Fix] KV-cache eviction mismatch across PP ranks in DeepSeek V3/R1 by @qhsc in https://github.com/sgl-project/sglang/pull/10214
- chore: upgrade v0.3.9 sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/10220
- Revert the changes on NCCL symmetric memory by @merrymercy in https://github.com/sgl-project/sglang/pull/10210
- Revert "Revert the changes on NCCL symmetric memory" by @merrymercy in https://github.com/sgl-project/sglang/pull/10238
- [HiCache] feat: add mooncake backend extra config by @stmatengss in https://github.com/sgl-project/sglang/pull/10213
- Add mamba kernel by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10234
- [Auto Sync] Update io_struct.py (20250909) by @merrymercy in https://github.com/sgl-project/sglang/pull/10236
- [Auto Sync] Update collector.py, startup_func_log_and_timer... (20250910) by @merrymercy in https://github.com/sgl-project/sglang/pull/10242
- Revert "chore: upgrade v0.3.9 sgl-kernel" by @merrymercy in https://github.com/sgl-project/sglang/pull/10245
- refactor(InternVL): Use gpu to preprocess the input image by @KEVINTUAN12 in https://github.com/sgl-project/sglang/pull/9795
- make --speculative-draft-model an alias of --speculative-draft-model-path by @merrymercy in https://github.com/sgl-project/sglang/pull/10246
- [UT for RL] Add UT to cover release/resume memory case for moe model by @ryang-max in https://github.com/sgl-project/sglang/pull/8803
- [Benchmark] Prefil-only benchmark scripts by @sundar24295s in https://github.com/sgl-project/sglang/pull/10240
- [doc] add walkthrough for implementing and hosting a simple llama wrapper m… by @glenliu21 in https://github.com/sgl-project/sglang/pull/10093
- Fix: the default choice is wrong for flashinfer mxfp4 moe precision by @LauYeeYu in https://github.com/sgl-project/sglang/pull/10253
- Page first direct IO kernel by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/10060
- support vlm model spec bench by @Lzhang-hub in https://github.com/sgl-project/sglang/pull/10173
- Fix assertion typo in tp_worker.py by @sgncho in https://github.com/sgl-project/sglang/pull/9954
- [Auto Sync] Update io_struct.py (20250910) by @merrymercy in https://github.com/sgl-project/sglang/pull/10262
- Fix potential flakiness in test_lora_qwen3 by @lifuhuang in https://github.com/sgl-project/sglang/pull/10250
- [router][ci] Add PD router mmlu test by @key4ng in https://github.com/sgl-project/sglang/pull/10256
- [1/2] Refactor LoRA to support backend-specific batch preprocessing. by @lifuhuang in https://github.com/sgl-project/sglang/pull/10251
- [Bugfix] Fix Weightloading for the original nvidia/Deepseek-R1-FP4 checkpoint by @pavanimajety in https://github.com/sgl-project/sglang/pull/9940
- add dual stream for qwen2_moe by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10252
- Add tests to AMD CI for MI35x by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/9662
- pass a_scale from fp8 quant result instead of hard code to 1.0f by @rainj-me in https://github.com/sgl-project/sglang/pull/10241
- Feat: support disable tool parser by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/10184
- [Auto Sync] Update serving_base.py, serving_chat.py, servin... (20250910) by @merrymercy in https://github.com/sgl-project/sglang/pull/10282
- Revert "[1/2] Optimizations and refactors about quant kernel (#9534)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10292
- chore: bump sgl-kernel 0.3.9.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/10294
- [Feature] Support DeepEP normal & Redundant Experts on NPU by @iforgetmyname in https://github.com/sgl-project/sglang/pull/9881
- add flash linear attention triton kernel by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10239
- [chore]Add sgl-router to npu images by @BourneSun0527 in https://github.com/sgl-project/sglang/pull/10229
- [CPU] fix OOM when mem-fraction is not set by @ZailiWang in https://github.com/sgl-project/sglang/pull/9090
- [fix CI] Fix logical condition in fused MoE layer for compressed tensor quantization by @BBuf in https://github.com/sgl-project/sglang/pull/10299
- Revert "Fix flashinfer version in sgl-kernel (#10135)" by @zhyncs in https://github.com/sgl-project/sglang/pull/10310
- chore: bump sgl-kernel 0.3.9.post2 by @zhyncs in https://github.com/sgl-project/sglang/pull/10311
- [CI] add pyproject.toml to deepseek w4a8 ci by @HanHan009527 in https://github.com/sgl-project/sglang/pull/10314
- chore: upgrade v0.3.9.post2 sgl-kernel by @zhyncs in https://github.com/sgl-project/sglang/pull/10297
- Qwen3-Next support by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10233
- [Auto Sync] Update parallel_state.py (20250911) by @merrymercy in https://github.com/sgl-project/sglang/pull/10326
- [Minor] Improve the style of server args by @merrymercy in https://github.com/sgl-project/sglang/pull/10328
- [bugfix] fix norm type error in qwen3_next model by @cao1zhg in https://github.com/sgl-project/sglang/pull/10322
- [Qwen3-Next] switch to triton and cache conv states to accelerate MTP from 300 tok/s to 341 tok/s by @hebiao064 in https://github.com/sgl-project/sglang/pull/10335
- [router] add benchmark for regular router and pd router by @key4ng in https://github.com/sgl-project/sglang/pull/10280
- add h20 qwen3 next config by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10264
- [router] Add OpenAI backend support - core function by @key4ng in https://github.com/sgl-project/sglang/pull/10254
- [router][ci] add gpu process check and free port before start server by @key4ng in https://github.com/sgl-project/sglang/pull/10338
- add qwen3-next doc by @yizhang2077 in https://github.com/sgl-project/sglang/pull/10327
- fix: trtllm-gen attention take zero-init workspace by @yyihuang in https://github.com/sgl-project/sglang/pull/10330
- Fix errors of hicache kernels in sgl-kernel for ROCm by @hubertlu-tw in https://github.com/sgl-project/sglang/pull/10339
- update GLM nightly test threshold by @zminglei in https://github.com/sgl-project/sglang/pull/10331
- [LongCat] Optimize zero_experts_compute_triton by changing mask by @zk-lover in https://github.com/sgl-project/sglang/pull/10303
- add try catch for quant config hf download by @gongwei-130 in https://github.com/sgl-project/sglang/pull/10340
- chore: bump v0.5.2 by @zhyncs in https://github.com/sgl-project/sglang/pull/10221
New Contributors
- @Beichen-Ma made their first contribution in https://github.com/sgl-project/sglang/pull/9429
- @SCDESPERTATE made their first contribution in https://github.com/sgl-project/sglang/pull/7317
- @CiaranZhou made their first contribution in https://github.com/sgl-project/sglang/pull/9229
- @jonaslsaa made their first contribution in https://github.com/sgl-project/sglang/pull/9190
- @ykwd made their first contribution in https://github.com/sgl-project/sglang/pull/8901
- @ZhengdQin made their first contribution in https://github.com/sgl-project/sglang/pull/8328
- @lshmouse made their first contribution in https://github.com/sgl-project/sglang/pull/9630
- @GavinZhu-GMI made their first contribution in https://github.com/sgl-project/sglang/pull/9635
- @cicirori made their first contribution in https://github.com/sgl-project/sglang/pull/9665
- @KEVINTUAN12 made their first contribution in https://github.com/sgl-project/sglang/pull/9597
- @rainj-me made their first contribution in https://github.com/sgl-project/sglang/pull/9495
- @pabloiyu made their first contribution in https://github.com/sgl-project/sglang/pull/9397
- @KerwinKai made their first contribution in https://github.com/sgl-project/sglang/pull/9216
- @mmangkad made their first contribution in https://github.com/sgl-project/sglang/pull/9802
- @Orchard-DT made their first contribution in https://github.com/sgl-project/sglang/pull/9824
- @pbkowalski made their first contribution in https://github.com/sgl-project/sglang/pull/9073
- @LukasBluebaum made their first contribution in https://github.com/sgl-project/sglang/pull/9803
- @chenxijun1029 made their first contribution in https://github.com/sgl-project/sglang/pull/8118
- @tc-mb made their first contribution in https://github.com/sgl-project/sglang/pull/8747
- @alhridoy made their first contribution in https://github.com/sgl-project/sglang/pull/9946
- @xiaguan made their first contribution in https://github.com/sgl-project/sglang/pull/9927
- @WangJianQ-0118 made their first contribution in https://github.com/sgl-project/sglang/pull/9895
- @jingyu-ml made their first contribution in https://github.com/sgl-project/sglang/pull/7912
- @fangjian601 made their first contribution in https://github.com/sgl-project/sglang/pull/9906
- @SzymonOzog made their first contribution in https://github.com/sgl-project/sglang/pull/9978
- @gracehonv made their first contribution in https://github.com/sgl-project/sglang/pull/9314
- @JamesLim-sy made their first contribution in https://github.com/sgl-project/sglang/pull/9934
- @DevashishLal-CB made their first contribution in https://github.com/sgl-project/sglang/pull/5255
- @MahmoudAshraf97 made their first contribution in https://github.com/sgl-project/sglang/pull/8622
- @sdpkjc made their first contribution in https://github.com/sgl-project/sglang/pull/9884
- @shadowpa0327 made their first contribution in https://github.com/sgl-project/sglang/pull/6905
- @jinyangyuan-nvidia made their first contribution in https://github.com/sgl-project/sglang/pull/9834
- @Oasis-Git made their first contribution in https://github.com/sgl-project/sglang/pull/9741
- @jianyingzhu made their first contribution in https://github.com/sgl-project/sglang/pull/9969
- @benbarsdell made their first contribution in https://github.com/sgl-project/sglang/pull/10056
- @mss1213 made their first contribution in https://github.com/sgl-project/sglang/pull/9971
- @WeiweiZhang1 made their first contribution in https://github.com/sgl-project/sglang/pull/6226
- @huaiyuzh made their first contribution in https://github.com/sgl-project/sglang/pull/9434
- @ssshinigami made their first contribution in https://github.com/sgl-project/sglang/pull/9871
- @alanhe151220037 made their first contribution in https://github.com/sgl-project/sglang/pull/9925
- @Zhiy-Zhang made their first contribution in https://github.com/sgl-project/sglang/pull/10166
- @gerayking made their first contribution in https://github.com/sgl-project/sglang/pull/10191
- @wenhuipeng made their first contribution in https://github.com/sgl-project/sglang/pull/10165
- @Missmiaom made their first contribution in https://github.com/sgl-project/sglang/pull/10209
- @shaharmor98 made their first contribution in https://github.com/sgl-project/sglang/pull/9960
- @qhsc made their first contribution in https://github.com/sgl-project/sglang/pull/10214
- @glenliu21 made their first contribution in https://github.com/sgl-project/sglang/pull/10093
- @LauYeeYu made their first contribution in https://github.com/sgl-project/sglang/pull/10253
- @sgncho made their first contribution in https://github.com/sgl-project/sglang/pull/9954
- @BourneSun0527 made their first contribution in https://github.com/sgl-project/sglang/pull/10229
- @zk-lover made their first contribution in https://github.com/sgl-project/sglang/pull/10303
Full Changelog: https://github.com/sgl-project/sglang/compare/v0.5.1...v0.5.2