Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2025-06-24 | 24.6 kB | |
Release v0.4.8 source code.tar.gz | 2025-06-24 | 4.2 MB | |
Release v0.4.8 source code.zip | 2025-06-24 | 5.2 MB | |
Totals: 3 Items | 9.4 MB | 1 |
Highlights
OpenAI-Compatible Server Refactor
Re-structured the OpenAI-compatible server to support production and enterprise environments. Key improvements include:
-
Consistent metrics and logging for better observability and debugging.
-
Unified error handling, request validation, and processing logic for improved reliability and maintainability.
-
Improved request tracking across sessions and components.
-
Fixed bugs in embedding requests and reasoning parsers.
This work was a collaborative effort involving engineers from academic and industry institutions. Special thanks to the Oracle Cloud team and the SGLang team and community — including @slin1237, @CatherineSue, @key4ng, @JustinTong0323, @jhinpan, @yhyang201 and @whybeyoung — for their invaluable contributions.
DeepSeek R1 FP4 on Blackwell GPU
Added support for DeepSeek R1 with FP4 and MTP on NVIDIA Blackwell GPU.
-
Integrated FlashInfer NVFP4 MoE, supporting TP, EP, and DP.
-
Supported 2-stream shared expert execution.
-
Achieved up to 90 TPS per user at isl/osl/bs = 1k/1k/16 on B200.
Further optimization in progress. Special thanks to the FlashInfer, NVIDIA Enterprise Products, Novita AI, DataCrunch, Google Cloud, and SGLang teams — especially @Alcanderian and @pyc96 — for their critical contributions.
What's Changed
- Update README.md by @merrymercy in https://github.com/sgl-project/sglang/pull/7040
- [Docker] Upgrading base image from 24.04 to 24.12 by @Swipe4057 in https://github.com/sgl-project/sglang/pull/7043
- fix 24.12 docker by @zhyncs in https://github.com/sgl-project/sglang/pull/7045
- Minor cleanup of fa3 backend by @merrymercy in https://github.com/sgl-project/sglang/pull/6999
- Fix eagle on AMD by @merrymercy in https://github.com/sgl-project/sglang/pull/7051
- Clean up server_args.py by @merrymercy in https://github.com/sgl-project/sglang/pull/7037
- Minor style fix in cuda_graph_runner.py by @merrymercy in https://github.com/sgl-project/sglang/pull/7053
- [WA] fix output data is nan in CI test "test_moe_eval_accuracy_large.py" by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/7021
- [fix] libmlx5.so already in base image by @HanHan009527 in https://github.com/sgl-project/sglang/pull/7060
- Fix test_lora.py CI by @Fridge003 in https://github.com/sgl-project/sglang/pull/7061
- Tiny fix cutlass_mla_get_workspace_size stub incorrect signature by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7057
- Add sanity checks when a test file is not added to CI by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6947
- Revert "Add sanity checks when a test file is not added to CI (#6947)" by @zhyncs in https://github.com/sgl-project/sglang/pull/7063
- Fix missing tool call id if tool call index >0 in streaming tool call output. by @Xu-Wenqing in https://github.com/sgl-project/sglang/pull/7049
- chore: update dev docker by @zhyncs in https://github.com/sgl-project/sglang/pull/7064
- Open AI API hidden states by @kyle-pena-kuzco in https://github.com/sgl-project/sglang/pull/6716
- fix arm sgl-kernel link issue by @zhyncs in https://github.com/sgl-project/sglang/pull/7066
- [Feature] Add Logit Bias by @b8zhong in https://github.com/sgl-project/sglang/pull/6579
- Improve perf tuning docs by @merrymercy in https://github.com/sgl-project/sglang/pull/7071
- Frontend language separate reasoning support by @binarycrayon in https://github.com/sgl-project/sglang/pull/6031
- Do not run frontend_reasoning.ipynb to reduce the CI load by @merrymercy in https://github.com/sgl-project/sglang/pull/7073
- Simplify the heuristics for setting --mem-fraction-static by @merrymercy in https://github.com/sgl-project/sglang/pull/7054
- update doc by @Ximingwang-09 in https://github.com/sgl-project/sglang/pull/7046
- Clean up docs for server args and sampling parameters (generated by grok) by @merrymercy in https://github.com/sgl-project/sglang/pull/7076
- Fix GGuf and add back test_gguf.py by @Fridge003 in https://github.com/sgl-project/sglang/pull/7067
- vlm: adapt internvl to VisionAttention by @mickqian in https://github.com/sgl-project/sglang/pull/6870
- Fix circular import in test_prefix_chunk_info.py by @Fridge003 in https://github.com/sgl-project/sglang/pull/7097
- Fix misusing the "_is_cuda". by @sogalin in https://github.com/sgl-project/sglang/pull/7091
- Support VILA models by @futrime in https://github.com/sgl-project/sglang/pull/6106
- [FIX]remove redundant code in logits_processor.py by @pc-neo in https://github.com/sgl-project/sglang/pull/7079
- [feat]: Emit fixed-size KV blocks events by @faradawn in https://github.com/sgl-project/sglang/pull/6824
- [Perf] Refactor LoRAManager to eliminate stream syncs and redundant computations by @lifuhuang in https://github.com/sgl-project/sglang/pull/6994
- Fix positional argument by @liquanfeng in https://github.com/sgl-project/sglang/pull/7093
- [sgl-kernel] Add cuda kernel for moe_ep_silu_and_mul by @yuan-luo in https://github.com/sgl-project/sglang/pull/6919
- Improve log status by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7115
- feat: update blackwell setup by @zhyncs in https://github.com/sgl-project/sglang/pull/7119
- Update CODEOWNERS by @merrymercy in https://github.com/sgl-project/sglang/pull/7126
- Add gfx950 support for sgl-kernel. by @sogalin in https://github.com/sgl-project/sglang/pull/7092
- [Fix] Reduce busy polling when scheduler is idle by @p12tic in https://github.com/sgl-project/sglang/pull/6026
- Minor add utility to read expert distribution recorder output by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7134
- Remove unnecessary metadata_expand.max_seq_len_k operations in fa3 to… by @byjiang1996 in https://github.com/sgl-project/sglang/pull/7140
- Minor speedup topk postprocessing by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7058
- filter by num_hidden_layers by @pansicheng in https://github.com/sgl-project/sglang/pull/7056
- Remove 200us slow concat kernel (part 1: kernel) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7145
- Support new DeepGEMM format in per token group quant by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7146
- chore: bump v0.1.8.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/7152
- Support new DeepGEMM format in per token group quant (part 2: srt) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7155
- Fix DeepEP error in some environments by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7154
- Minor speed up block_quant_dequant by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6814
- Tiny add sanity checks for DeepGEMM inputs by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7157
- Remove 200us slow concat kernel (part 2: srt) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7020
- Re-quantize DeepSeek model weights to support DeepGEMM new input format by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7156
- Minor style change of triton backend by @merrymercy in https://github.com/sgl-project/sglang/pull/7165
- Split the eagle test into two files by @merrymercy in https://github.com/sgl-project/sglang/pull/7170
- Support new DeepGEMM input format in silu_and_mul_masked_post_quant_fwd by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7153
- Refactor DeepGEMM integration by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7150
- Add test for refactored openai server by @jhinpan in https://github.com/sgl-project/sglang/pull/7161
- Improve test cases for eagle infer by @merrymercy in https://github.com/sgl-project/sglang/pull/7173
- Support new DeepGEMM by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7172
- Increase timeout in test/srt/test_disaggregation.py by @merrymercy in https://github.com/sgl-project/sglang/pull/7175
- Add Phi-4-mm to supported VLM supported model list. by @lifuhuang in https://github.com/sgl-project/sglang/pull/7178
- Fix shared experts fusion + weight requant by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7177
- [fix] fix dsv3 weight loader tqdm and simplify shared experts fusion by @Alcanderian in https://github.com/sgl-project/sglang/pull/7181
- [fix] fix cutlass_mla_backend with cuda_graph and add sm_scale for sgl-kernel cutlass_mla by @Alcanderian in https://github.com/sgl-project/sglang/pull/7184
- [PD] Update prefill.py by @ByronHsu in https://github.com/sgl-project/sglang/pull/7190
- Fix a minor bug related to DeepGEMM upgrade by @zhijian-liu in https://github.com/sgl-project/sglang/pull/7191
- chore: bump v0.1.8.post2 by @zhyncs in https://github.com/sgl-project/sglang/pull/7189
- [fix] fix determine_num_fused_shared_experts by @Alcanderian in https://github.com/sgl-project/sglang/pull/7180
- chore: upgrade sgl-kernel v0.1.8.post2 by @Alcanderian in https://github.com/sgl-project/sglang/pull/7186
- Fix NCCL 2.27.3 not in docker image by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7195
- Fix error when disabling new DeepGEMM by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7198
- [PD] Support decode retract and update decode.py by @ByronHsu in https://github.com/sgl-project/sglang/pull/7196
- Move host memory pools into a separate file by @merrymercy in https://github.com/sgl-project/sglang/pull/7200
- Lianmin/simplify memory pool by @merrymercy in https://github.com/sgl-project/sglang/pull/7202
- Fix grammar abort & Minor style fixes by @merrymercy in https://github.com/sgl-project/sglang/pull/7204
- feat: use zstd for docker by @zhyncs in https://github.com/sgl-project/sglang/pull/7205
- [EAGLE] Refactor code for page size > 1 & more simplifications by @merrymercy in https://github.com/sgl-project/sglang/pull/7163
- Revert "[EAGLE] Refactor code for page size > 1 & more simplifications" by @merrymercy in https://github.com/sgl-project/sglang/pull/7210
- [PD] use int32 for kv indices & get num_reserved_decode_tokens from server_args by @ByronHsu in https://github.com/sgl-project/sglang/pull/7214
- Minor PD style fix by @ByronHsu in https://github.com/sgl-project/sglang/pull/7215
- Fix ChunkCache object has no attribute 'disable' by @Fridge003 in https://github.com/sgl-project/sglang/pull/7217
- Implement gather before attn by @ch-wan in https://github.com/sgl-project/sglang/pull/6378
- Support LoRA in MMMU benchmark script. by @lifuhuang in https://github.com/sgl-project/sglang/pull/7218
- refine fused_moe benchmark by @BBuf in https://github.com/sgl-project/sglang/pull/7221
- Minor style and doc fix by @merrymercy in https://github.com/sgl-project/sglang/pull/7228
- [EAGLE] Refactor code for page size > 1 & more simplifications by @merrymercy in https://github.com/sgl-project/sglang/pull/7213
- Fix sampling for speculative decoding & simplify kernels by @merrymercy in https://github.com/sgl-project/sglang/pull/7207
- Release sgl-kernel 0.1.9 by @merrymercy in https://github.com/sgl-project/sglang/pull/7232
- [EAGLE] Fix draft kv cache layout for fa3 and topk > 1 by @merrymercy in https://github.com/sgl-project/sglang/pull/7239
- [Eagle] Fix kernel call after updating speculative sampling kernels by @merrymercy in https://github.com/sgl-project/sglang/pull/7231
- minor fix by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7245
- Tiny remove comments about DeepEP on H20 by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7234
- Feat/support rerank by @woodx9 in https://github.com/sgl-project/sglang/pull/6058
- [fix] fix DeepGEMM blackwell input quant & ut & fix style and log by @Alcanderian in https://github.com/sgl-project/sglang/pull/7247
- Update CI flakes. by @saienduri in https://github.com/sgl-project/sglang/pull/7244
- chore: bump v0.4.7.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/7248
- fix amd EP MoE FP8 issue by @alexsun07 in https://github.com/sgl-project/sglang/pull/7125
- Use seq_len_fill_value in the cuda graph runners by @merrymercy in https://github.com/sgl-project/sglang/pull/7233
- support custom weight loader for model runner by @yukavio in https://github.com/sgl-project/sglang/pull/7122
- Fix AMD speculative decoding by @merrymercy in https://github.com/sgl-project/sglang/pull/7252
- [Refactor] OAI Server components by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/7167
- OAI Server Skeleton & Core Utility Endpoints by @yhyang201 in https://github.com/sgl-project/sglang/pull/7179
- [amd] Opt dsv3 moe by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/7160
- update ci node for xeon by @DiweiSun in https://github.com/sgl-project/sglang/pull/7265
- feat: mtp support dp-attention by @u4lr451 in https://github.com/sgl-project/sglang/pull/6081
- support qwen2 running on ascend npu device by @zhuyijie88 in https://github.com/sgl-project/sglang/pull/7022
- Fix Deepseek R1 0528 FP4 tensor name mismatch issue during weights loading. by @pyc96 in https://github.com/sgl-project/sglang/pull/7164
- bugfix(tool call ebnf): Fix EBNF generation for optional function parameters by @CatherineSue in https://github.com/sgl-project/sglang/pull/7283
- Fix AWQ Dequant and Weight Loading of deepseek v2 by @AniZpZ in https://github.com/sgl-project/sglang/pull/6842
- fix: resolve b200 dsv3 mtp issue by @zhyncs in https://github.com/sgl-project/sglang/pull/7286
- ci: Fix test_ebnf_generate_all_optional_function_params by @CatherineSue in https://github.com/sgl-project/sglang/pull/7288
- fix: only enable flash_attn test on sm80 sm90 by @zhyncs in https://github.com/sgl-project/sglang/pull/7289
- [PD] Support get local ip from NIC for PD disaggregation by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7237
- [PD] Add custom memory pool option to support Mooncake PD with NVLink by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7264
- Upstreaming hicache bug fixes by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/7267
- Update python API of activation, topk, norm and rope and remove vllm dependency by @yanbing-j in https://github.com/sgl-project/sglang/pull/6614
- Fix hicache benchmark script bug - some sampled input_request is [] by @byjiang1996 in https://github.com/sgl-project/sglang/pull/7300
- chore: change logs from
INFO
toDEBUG
for dp and add force quit for tokenizer manager by @ishandhanani in https://github.com/sgl-project/sglang/pull/7251 - update invalid link in doc by @habaohaba in https://github.com/sgl-project/sglang/pull/7297
- Fix mini_lb for PD with long output: limit chunk size of decode response by @ch-tiger1 in https://github.com/sgl-project/sglang/pull/7301
- Fix profiler error when there are idle passes by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7003
- [pd] optimize dockerfile for pd disaggregation by @whybeyoung in https://github.com/sgl-project/sglang/pull/7319
- Merge PDLB (Prefill-Decode Load Balancer) into SGLang Router by @slin1237 in https://github.com/sgl-project/sglang/pull/7096
- Add more refactored openai test & in CI by @jhinpan in https://github.com/sgl-project/sglang/pull/7284
- fix: resolve blackwell deepep image issue by @zhyncs in https://github.com/sgl-project/sglang/pull/7331
- add seed in CPU UTs to avoid flaky failure by @chunyuan-w in https://github.com/sgl-project/sglang/pull/7333
- Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately by @hebiao064 in https://github.com/sgl-project/sglang/pull/7099
- Reintroduce tiny fix sampler error when prob is not contiguous by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7354
- [Refactor] Clean up radix cache related API by @DarkSharpness in https://github.com/sgl-project/sglang/pull/7303
- Put
_normalize_rid
before other normalization inio_struct
by @CatherineSue in https://github.com/sgl-project/sglang/pull/7363 - [PD] Transfer hidden states for mtp when disaggregation by @Atream in https://github.com/sgl-project/sglang/pull/7242
- [Bugfix][PD] Set conclude state before clear when failure happens by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7362
- docs: update installation by @zhyncs in https://github.com/sgl-project/sglang/pull/7366
- [Docker] optimize dockerfile remove deepep and blackwell merge it to… by @whybeyoung in https://github.com/sgl-project/sglang/pull/7343
- Clean unused import for mimo mtp model by @lambert0312 in https://github.com/sgl-project/sglang/pull/7370
- [Bugfix]Fix hang bug using dp attention with HiRadixCache by @LLLL114 in https://github.com/sgl-project/sglang/pull/7159
- [Doc] add embedding rerank doc by @woodx9 in https://github.com/sgl-project/sglang/pull/7364
- Fix judgment condition for enabling Deepseek V3/R1 shared expert fusion optimization by @lambert0312 in https://github.com/sgl-project/sglang/pull/7371
- Feat/refactor embedding server by @woodx9 in https://github.com/sgl-project/sglang/pull/7322
- Purge VerlEngine by @MrAta in https://github.com/sgl-project/sglang/pull/7326
- support return logprobs for pipeline by @strgrb in https://github.com/sgl-project/sglang/pull/7356
- [PD] Optimize custom mem pool usage and bump mooncake version by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7393
- Support THUDM/GLM-4-0414 (GLM-Z1) Glm4ForCausalLM architecture. by @solrex in https://github.com/sgl-project/sglang/pull/5485
- Refine OpenAI serving entrypoint to remove batch requests by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/7372
- [Feature] Comprehensive Hybrid Parallelism Support by @ch-wan in https://github.com/sgl-project/sglang/pull/6389
- [DeepSeekNextN] fix: residual of head norm can be None by @ch-wan in https://github.com/sgl-project/sglang/pull/7398
- [OAI refactor] Add rerank and score serving by @woodx9 in https://github.com/sgl-project/sglang/pull/7399
- [OAI Server Refactor] [ChatCompletions & Completions] Implement UsageInfo Processor by @yhyang201 in https://github.com/sgl-project/sglang/pull/7360
- Fix All-Gather under world size one by @ch-wan in https://github.com/sgl-project/sglang/pull/7219
- Optimize DP attn scheduling for speculative decoding by @ch-wan in https://github.com/sgl-project/sglang/pull/7285
- Update usage_processor.py by @ch-wan in https://github.com/sgl-project/sglang/pull/7402
- Fix 7285 Merge Conflicts by @ch-wan in https://github.com/sgl-project/sglang/pull/7403
- chore: upgrade mooncake-transfer-engine 0.3.4 by @zhyncs in https://github.com/sgl-project/sglang/pull/7401
- [OAI Server Refactor] [ChatCompletions & Completions] Support Return Hidden State by @key4ng in https://github.com/sgl-project/sglang/pull/7329
- Remove batches api in docs & example by @jhinpan in https://github.com/sgl-project/sglang/pull/7400
- [BugFix]: fix EmbeddingReqInput single input error by @woodx9 in https://github.com/sgl-project/sglang/pull/7396
- [BugFix]fix qwen25 invoke function call streaming responses with curly braces as the starting indicator by @ehuaa in https://github.com/sgl-project/sglang/pull/7394
- fix overlap pagecount by @pansicheng in https://github.com/sgl-project/sglang/pull/6984
- fix: Fix CI test_function_call_parser.py by @CatherineSue in https://github.com/sgl-project/sglang/pull/7425
- Fix CPU offloading for MLA memory pool by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7409
- [fix] PD disaggregation when enable mtp and tp!=dp by @Atream in https://github.com/sgl-project/sglang/pull/7420
- feat(oai refactor): Replace
openai_api
withentrypoints/openai
by @CatherineSue in https://github.com/sgl-project/sglang/pull/7351 - Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support by @lifuhuang in https://github.com/sgl-project/sglang/pull/7412
- refactor(test): reorganize OpenAI test file structure by @CatherineSue in https://github.com/sgl-project/sglang/pull/7408
- [minor] simplify the
TokenToKVPoolAllocator
by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7414 - Tiny add logging for GC by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7406
- FlashInfer NVFP4 MoE with EP & 2-stream shared expert by @trevor-m in https://github.com/sgl-project/sglang/pull/7327
- Remove copy after bmm by @ispobock in https://github.com/sgl-project/sglang/pull/7441
- Fix torch compile run by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/7391
- [misc] Add PD service discovery support in router by @slin1237 in https://github.com/sgl-project/sglang/pull/7361
- add fused moe config for qwen3 in triton3.3.1 by @yizhang2077 in https://github.com/sgl-project/sglang/pull/7445
- Fix CUDA Graph Check under Deepep with DP FFN by @ch-wan in https://github.com/sgl-project/sglang/pull/7451
- Update hyperparameter_tuning.md by @merrymercy in https://github.com/sgl-project/sglang/pull/7454
- feat: integrate deepgemm into EPMoE by @xutizhou in https://github.com/sgl-project/sglang/pull/6821
- Solve docker build failed in the virtual machine by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/7290
- Fix a bug in BatchTokenIDOut & Misc style and dependency updates by @merrymercy in https://github.com/sgl-project/sglang/pull/7457
- [CI] Upgrade mooncake to 0.3.4.post1 to fix 8 gpu tests by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7472
- Fix prefill OOM due to wrong token calculation when page > 1 by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7397
- feat(func_call): Add more check in
BaseFormatDetector.parse_streaming_increment
by @CatherineSue in https://github.com/sgl-project/sglang/pull/7479 - Fix dtype for idle input in spec decoding by @ch-wan in https://github.com/sgl-project/sglang/pull/7456
- update mooncake in dockerfile by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7480
- kvcache io kernels and test case by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/7382
- [perf] slightly imporve DeepSeek-R1-FP4 TP8 by @Alcanderian in https://github.com/sgl-project/sglang/pull/7481
- Quick fix for DeepGemm requant to also cover MTP. by @pyc96 in https://github.com/sgl-project/sglang/pull/7378
- Support weight loading without mmap by @guoyuhong in https://github.com/sgl-project/sglang/pull/7469
- ci: Revert openai_server related tests in AMD suites by @CatherineSue in https://github.com/sgl-project/sglang/pull/7449
- Perormance: Enable cuda graph for dp idle batch by @u4lr451 in https://github.com/sgl-project/sglang/pull/7269
- bugfix: Prevent global mutation of conv.stop_str across requests by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/7347
- Fix RequestValidationError response format by @CatherineSue in https://github.com/sgl-project/sglang/pull/7487
- Fix MTP with Deepseek R1 Fp4 by @pyc96 in https://github.com/sgl-project/sglang/pull/7376
- chore: bump sgl-kernel v0.2.0 by @zhyncs in https://github.com/sgl-project/sglang/pull/7490
- chore: bump v0.4.8 by @zhyncs in https://github.com/sgl-project/sglang/pull/7493
New Contributors
- @futrime made their first contribution in https://github.com/sgl-project/sglang/pull/6106
- @faradawn made their first contribution in https://github.com/sgl-project/sglang/pull/6824
- @liquanfeng made their first contribution in https://github.com/sgl-project/sglang/pull/7093
- @p12tic made their first contribution in https://github.com/sgl-project/sglang/pull/6026
- @byjiang1996 made their first contribution in https://github.com/sgl-project/sglang/pull/7140
- @zhijian-liu made their first contribution in https://github.com/sgl-project/sglang/pull/7191
- @DiweiSun made their first contribution in https://github.com/sgl-project/sglang/pull/7265
- @zhuyijie88 made their first contribution in https://github.com/sgl-project/sglang/pull/7022
- @pyc96 made their first contribution in https://github.com/sgl-project/sglang/pull/7164
- @ch-tiger1 made their first contribution in https://github.com/sgl-project/sglang/pull/7301
- @Atream made their first contribution in https://github.com/sgl-project/sglang/pull/7242
- @LLLL114 made their first contribution in https://github.com/sgl-project/sglang/pull/7159
- @key4ng made their first contribution in https://github.com/sgl-project/sglang/pull/7329
- @ehuaa made their first contribution in https://github.com/sgl-project/sglang/pull/7394
Full Changelog: https://github.com/sgl-project/sglang/compare/v0.4.7...v0.4.8