SGLang - Browse /v0.4.8 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-06-24	24.6 kB	0
Release v0.4.8 source code.tar.gz	2025-06-24	4.2 MB	0
Release v0.4.8 source code.zip	2025-06-24	5.2 MB	1
Totals: 3 Items		9.4 MB	1

Highlights

OpenAI-Compatible Server Refactor

Re-structured the OpenAI-compatible server to support production and enterprise environments. Key improvements include:

Consistent metrics and logging for better observability and debugging.
Unified error handling, request validation, and processing logic for improved reliability and maintainability.
Improved request tracking across sessions and components.
Fixed bugs in embedding requests and reasoning parsers.

This work was a collaborative effort involving engineers from academic and industry institutions. Special thanks to the Oracle Cloud team and the SGLang team and community — including @slin1237, @CatherineSue, @key4ng, @JustinTong0323, @jhinpan, @yhyang201 and @whybeyoung — for their invaluable contributions.

DeepSeek R1 FP4 on Blackwell GPU

Added support for DeepSeek R1 with FP4 and MTP on NVIDIA Blackwell GPU.

Integrated FlashInfer NVFP4 MoE, supporting TP, EP, and DP.
Supported 2-stream shared expert execution.
Achieved up to 90 TPS per user at isl/osl/bs = 1k/1k/16 on B200.

Further optimization in progress. Special thanks to the FlashInfer, NVIDIA Enterprise Products, Novita AI, DataCrunch, Google Cloud, and SGLang teams — especially @Alcanderian and @pyc96 — for their critical contributions.

What's Changed

Update README.md by @merrymercy in https://github.com/sgl-project/sglang/pull/7040
[Docker] Upgrading base image from 24.04 to 24.12 by @Swipe4057 in https://github.com/sgl-project/sglang/pull/7043
fix 24.12 docker by @zhyncs in https://github.com/sgl-project/sglang/pull/7045
Minor cleanup of fa3 backend by @merrymercy in https://github.com/sgl-project/sglang/pull/6999
Fix eagle on AMD by @merrymercy in https://github.com/sgl-project/sglang/pull/7051
Clean up server_args.py by @merrymercy in https://github.com/sgl-project/sglang/pull/7037
Minor style fix in cuda_graph_runner.py by @merrymercy in https://github.com/sgl-project/sglang/pull/7053
[WA] fix output data is nan in CI test "test_moe_eval_accuracy_large.py" by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/7021
[fix] libmlx5.so already in base image by @HanHan009527 in https://github.com/sgl-project/sglang/pull/7060
Fix test_lora.py CI by @Fridge003 in https://github.com/sgl-project/sglang/pull/7061
Tiny fix cutlass_mla_get_workspace_size stub incorrect signature by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7057
Add sanity checks when a test file is not added to CI by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6947
Revert "Add sanity checks when a test file is not added to CI (#6947)" by @zhyncs in https://github.com/sgl-project/sglang/pull/7063
Fix missing tool call id if tool call index >0 in streaming tool call output. by @Xu-Wenqing in https://github.com/sgl-project/sglang/pull/7049
chore: update dev docker by @zhyncs in https://github.com/sgl-project/sglang/pull/7064
Open AI API hidden states by @kyle-pena-kuzco in https://github.com/sgl-project/sglang/pull/6716
fix arm sgl-kernel link issue by @zhyncs in https://github.com/sgl-project/sglang/pull/7066
[Feature] Add Logit Bias by @b8zhong in https://github.com/sgl-project/sglang/pull/6579
Improve perf tuning docs by @merrymercy in https://github.com/sgl-project/sglang/pull/7071
Frontend language separate reasoning support by @binarycrayon in https://github.com/sgl-project/sglang/pull/6031
Do not run frontend_reasoning.ipynb to reduce the CI load by @merrymercy in https://github.com/sgl-project/sglang/pull/7073
Simplify the heuristics for setting --mem-fraction-static by @merrymercy in https://github.com/sgl-project/sglang/pull/7054
update doc by @Ximingwang-09 in https://github.com/sgl-project/sglang/pull/7046
Clean up docs for server args and sampling parameters (generated by grok) by @merrymercy in https://github.com/sgl-project/sglang/pull/7076
Fix GGuf and add back test_gguf.py by @Fridge003 in https://github.com/sgl-project/sglang/pull/7067
vlm: adapt internvl to VisionAttention by @mickqian in https://github.com/sgl-project/sglang/pull/6870
Fix circular import in test_prefix_chunk_info.py by @Fridge003 in https://github.com/sgl-project/sglang/pull/7097
Fix misusing the "_is_cuda". by @sogalin in https://github.com/sgl-project/sglang/pull/7091
Support VILA models by @futrime in https://github.com/sgl-project/sglang/pull/6106
[FIX]remove redundant code in logits_processor.py by @pc-neo in https://github.com/sgl-project/sglang/pull/7079
[feat]: Emit fixed-size KV blocks events by @faradawn in https://github.com/sgl-project/sglang/pull/6824
[Perf] Refactor LoRAManager to eliminate stream syncs and redundant computations by @lifuhuang in https://github.com/sgl-project/sglang/pull/6994
Fix positional argument by @liquanfeng in https://github.com/sgl-project/sglang/pull/7093
[sgl-kernel] Add cuda kernel for moe_ep_silu_and_mul by @yuan-luo in https://github.com/sgl-project/sglang/pull/6919
Improve log status by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7115
feat: update blackwell setup by @zhyncs in https://github.com/sgl-project/sglang/pull/7119
Update CODEOWNERS by @merrymercy in https://github.com/sgl-project/sglang/pull/7126
Add gfx950 support for sgl-kernel. by @sogalin in https://github.com/sgl-project/sglang/pull/7092
[Fix] Reduce busy polling when scheduler is idle by @p12tic in https://github.com/sgl-project/sglang/pull/6026
Minor add utility to read expert distribution recorder output by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7134
Remove unnecessary metadata_expand.max_seq_len_k operations in fa3 to… by @byjiang1996 in https://github.com/sgl-project/sglang/pull/7140
Minor speedup topk postprocessing by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7058
filter by num_hidden_layers by @pansicheng in https://github.com/sgl-project/sglang/pull/7056
Remove 200us slow concat kernel (part 1: kernel) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7145
Support new DeepGEMM format in per token group quant by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7146
chore: bump v0.1.8.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/7152
Support new DeepGEMM format in per token group quant (part 2: srt) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7155
Fix DeepEP error in some environments by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7154
Minor speed up block_quant_dequant by @fzyzcjy in https://github.com/sgl-project/sglang/pull/6814
Tiny add sanity checks for DeepGEMM inputs by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7157
Remove 200us slow concat kernel (part 2: srt) by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7020
Re-quantize DeepSeek model weights to support DeepGEMM new input format by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7156
Minor style change of triton backend by @merrymercy in https://github.com/sgl-project/sglang/pull/7165
Split the eagle test into two files by @merrymercy in https://github.com/sgl-project/sglang/pull/7170
Support new DeepGEMM input format in silu_and_mul_masked_post_quant_fwd by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7153
Refactor DeepGEMM integration by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7150
Add test for refactored openai server by @jhinpan in https://github.com/sgl-project/sglang/pull/7161
Improve test cases for eagle infer by @merrymercy in https://github.com/sgl-project/sglang/pull/7173
Support new DeepGEMM by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7172
Increase timeout in test/srt/test_disaggregation.py by @merrymercy in https://github.com/sgl-project/sglang/pull/7175
Add Phi-4-mm to supported VLM supported model list. by @lifuhuang in https://github.com/sgl-project/sglang/pull/7178
Fix shared experts fusion + weight requant by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7177
[fix] fix dsv3 weight loader tqdm and simplify shared experts fusion by @Alcanderian in https://github.com/sgl-project/sglang/pull/7181
[fix] fix cutlass_mla_backend with cuda_graph and add sm_scale for sgl-kernel cutlass_mla by @Alcanderian in https://github.com/sgl-project/sglang/pull/7184
[PD] Update prefill.py by @ByronHsu in https://github.com/sgl-project/sglang/pull/7190
Fix a minor bug related to DeepGEMM upgrade by @zhijian-liu in https://github.com/sgl-project/sglang/pull/7191
chore: bump v0.1.8.post2 by @zhyncs in https://github.com/sgl-project/sglang/pull/7189
[fix] fix determine_num_fused_shared_experts by @Alcanderian in https://github.com/sgl-project/sglang/pull/7180
chore: upgrade sgl-kernel v0.1.8.post2 by @Alcanderian in https://github.com/sgl-project/sglang/pull/7186
Fix NCCL 2.27.3 not in docker image by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7195
Fix error when disabling new DeepGEMM by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7198
[PD] Support decode retract and update decode.py by @ByronHsu in https://github.com/sgl-project/sglang/pull/7196
Move host memory pools into a separate file by @merrymercy in https://github.com/sgl-project/sglang/pull/7200
Lianmin/simplify memory pool by @merrymercy in https://github.com/sgl-project/sglang/pull/7202
Fix grammar abort & Minor style fixes by @merrymercy in https://github.com/sgl-project/sglang/pull/7204
feat: use zstd for docker by @zhyncs in https://github.com/sgl-project/sglang/pull/7205
[EAGLE] Refactor code for page size > 1 & more simplifications by @merrymercy in https://github.com/sgl-project/sglang/pull/7163
Revert "[EAGLE] Refactor code for page size > 1 & more simplifications" by @merrymercy in https://github.com/sgl-project/sglang/pull/7210
[PD] use int32 for kv indices & get num_reserved_decode_tokens from server_args by @ByronHsu in https://github.com/sgl-project/sglang/pull/7214
Minor PD style fix by @ByronHsu in https://github.com/sgl-project/sglang/pull/7215
Fix ChunkCache object has no attribute 'disable' by @Fridge003 in https://github.com/sgl-project/sglang/pull/7217
Implement gather before attn by @ch-wan in https://github.com/sgl-project/sglang/pull/6378
Support LoRA in MMMU benchmark script. by @lifuhuang in https://github.com/sgl-project/sglang/pull/7218
refine fused_moe benchmark by @BBuf in https://github.com/sgl-project/sglang/pull/7221
Minor style and doc fix by @merrymercy in https://github.com/sgl-project/sglang/pull/7228
[EAGLE] Refactor code for page size > 1 & more simplifications by @merrymercy in https://github.com/sgl-project/sglang/pull/7213
Fix sampling for speculative decoding & simplify kernels by @merrymercy in https://github.com/sgl-project/sglang/pull/7207
Release sgl-kernel 0.1.9 by @merrymercy in https://github.com/sgl-project/sglang/pull/7232
[EAGLE] Fix draft kv cache layout for fa3 and topk > 1 by @merrymercy in https://github.com/sgl-project/sglang/pull/7239
[Eagle] Fix kernel call after updating speculative sampling kernels by @merrymercy in https://github.com/sgl-project/sglang/pull/7231
minor fix by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7245
Tiny remove comments about DeepEP on H20 by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7234
Feat/support rerank by @woodx9 in https://github.com/sgl-project/sglang/pull/6058
[fix] fix DeepGEMM blackwell input quant & ut & fix style and log by @Alcanderian in https://github.com/sgl-project/sglang/pull/7247
Update CI flakes. by @saienduri in https://github.com/sgl-project/sglang/pull/7244
chore: bump v0.4.7.post1 by @zhyncs in https://github.com/sgl-project/sglang/pull/7248
fix amd EP MoE FP8 issue by @alexsun07 in https://github.com/sgl-project/sglang/pull/7125
Use seq_len_fill_value in the cuda graph runners by @merrymercy in https://github.com/sgl-project/sglang/pull/7233
support custom weight loader for model runner by @yukavio in https://github.com/sgl-project/sglang/pull/7122
Fix AMD speculative decoding by @merrymercy in https://github.com/sgl-project/sglang/pull/7252
[Refactor] OAI Server components by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/7167
OAI Server Skeleton & Core Utility Endpoints by @yhyang201 in https://github.com/sgl-project/sglang/pull/7179
[amd] Opt dsv3 moe by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/7160
update ci node for xeon by @DiweiSun in https://github.com/sgl-project/sglang/pull/7265
feat: mtp support dp-attention by @u4lr451 in https://github.com/sgl-project/sglang/pull/6081
support qwen2 running on ascend npu device by @zhuyijie88 in https://github.com/sgl-project/sglang/pull/7022
Fix Deepseek R1 0528 FP4 tensor name mismatch issue during weights loading. by @pyc96 in https://github.com/sgl-project/sglang/pull/7164
bugfix(tool call ebnf): Fix EBNF generation for optional function parameters by @CatherineSue in https://github.com/sgl-project/sglang/pull/7283
Fix AWQ Dequant and Weight Loading of deepseek v2 by @AniZpZ in https://github.com/sgl-project/sglang/pull/6842
fix: resolve b200 dsv3 mtp issue by @zhyncs in https://github.com/sgl-project/sglang/pull/7286
ci: Fix test_ebnf_generate_all_optional_function_params by @CatherineSue in https://github.com/sgl-project/sglang/pull/7288
fix: only enable flash_attn test on sm80 sm90 by @zhyncs in https://github.com/sgl-project/sglang/pull/7289
[PD] Support get local ip from NIC for PD disaggregation by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7237
[PD] Add custom memory pool option to support Mooncake PD with NVLink by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7264
Upstreaming hicache bug fixes by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/7267
Update python API of activation, topk, norm and rope and remove vllm dependency by @yanbing-j in https://github.com/sgl-project/sglang/pull/6614
Fix hicache benchmark script bug - some sampled input_request is [] by @byjiang1996 in https://github.com/sgl-project/sglang/pull/7300
chore: change logs fromINFO to DEBUG for dp and add force quit for tokenizer manager by @ishandhanani in https://github.com/sgl-project/sglang/pull/7251
update invalid link in doc by @habaohaba in https://github.com/sgl-project/sglang/pull/7297
Fix mini_lb for PD with long output: limit chunk size of decode response by @ch-tiger1 in https://github.com/sgl-project/sglang/pull/7301
Fix profiler error when there are idle passes by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7003
[pd] optimize dockerfile for pd disaggregation by @whybeyoung in https://github.com/sgl-project/sglang/pull/7319
Merge PDLB (Prefill-Decode Load Balancer) into SGLang Router by @slin1237 in https://github.com/sgl-project/sglang/pull/7096
Add more refactored openai test & in CI by @jhinpan in https://github.com/sgl-project/sglang/pull/7284
fix: resolve blackwell deepep image issue by @zhyncs in https://github.com/sgl-project/sglang/pull/7331
add seed in CPU UTs to avoid flaky failure by @chunyuan-w in https://github.com/sgl-project/sglang/pull/7333
Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately by @hebiao064 in https://github.com/sgl-project/sglang/pull/7099
Reintroduce tiny fix sampler error when prob is not contiguous by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7354
[Refactor] Clean up radix cache related API by @DarkSharpness in https://github.com/sgl-project/sglang/pull/7303
Put _normalize_rid before other normalization in io_struct by @CatherineSue in https://github.com/sgl-project/sglang/pull/7363
[PD] Transfer hidden states for mtp when disaggregation by @Atream in https://github.com/sgl-project/sglang/pull/7242
[Bugfix][PD] Set conclude state before clear when failure happens by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7362
docs: update installation by @zhyncs in https://github.com/sgl-project/sglang/pull/7366
[Docker] optimize dockerfile remove deepep and blackwell merge it to… by @whybeyoung in https://github.com/sgl-project/sglang/pull/7343
Clean unused import for mimo mtp model by @lambert0312 in https://github.com/sgl-project/sglang/pull/7370
[Bugfix]Fix hang bug using dp attention with HiRadixCache by @LLLL114 in https://github.com/sgl-project/sglang/pull/7159
[Doc] add embedding rerank doc by @woodx9 in https://github.com/sgl-project/sglang/pull/7364
Fix judgment condition for enabling Deepseek V3/R1 shared expert fusion optimization by @lambert0312 in https://github.com/sgl-project/sglang/pull/7371
Feat/refactor embedding server by @woodx9 in https://github.com/sgl-project/sglang/pull/7322
Purge VerlEngine by @MrAta in https://github.com/sgl-project/sglang/pull/7326
support return logprobs for pipeline by @strgrb in https://github.com/sgl-project/sglang/pull/7356
[PD] Optimize custom mem pool usage and bump mooncake version by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7393
Support THUDM/GLM-4-0414 (GLM-Z1) Glm4ForCausalLM architecture. by @solrex in https://github.com/sgl-project/sglang/pull/5485
Refine OpenAI serving entrypoint to remove batch requests by @JustinTong0323 in https://github.com/sgl-project/sglang/pull/7372
[Feature] Comprehensive Hybrid Parallelism Support by @ch-wan in https://github.com/sgl-project/sglang/pull/6389
[DeepSeekNextN] fix: residual of head norm can be None by @ch-wan in https://github.com/sgl-project/sglang/pull/7398
[OAI refactor] Add rerank and score serving by @woodx9 in https://github.com/sgl-project/sglang/pull/7399
[OAI Server Refactor] [ChatCompletions & Completions] Implement UsageInfo Processor by @yhyang201 in https://github.com/sgl-project/sglang/pull/7360
Fix All-Gather under world size one by @ch-wan in https://github.com/sgl-project/sglang/pull/7219
Optimize DP attn scheduling for speculative decoding by @ch-wan in https://github.com/sgl-project/sglang/pull/7285
Update usage_processor.py by @ch-wan in https://github.com/sgl-project/sglang/pull/7402
Fix 7285 Merge Conflicts by @ch-wan in https://github.com/sgl-project/sglang/pull/7403
chore: upgrade mooncake-transfer-engine 0.3.4 by @zhyncs in https://github.com/sgl-project/sglang/pull/7401
[OAI Server Refactor] [ChatCompletions & Completions] Support Return Hidden State by @key4ng in https://github.com/sgl-project/sglang/pull/7329
Remove batches api in docs & example by @jhinpan in https://github.com/sgl-project/sglang/pull/7400
[BugFix]: fix EmbeddingReqInput single input error by @woodx9 in https://github.com/sgl-project/sglang/pull/7396
[BugFix]fix qwen25 invoke function call streaming responses with curly braces as the starting indicator by @ehuaa in https://github.com/sgl-project/sglang/pull/7394
fix overlap pagecount by @pansicheng in https://github.com/sgl-project/sglang/pull/6984
fix: Fix CI test_function_call_parser.py by @CatherineSue in https://github.com/sgl-project/sglang/pull/7425
Fix CPU offloading for MLA memory pool by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7409
[fix] PD disaggregation when enable mtp and tp!=dp by @Atream in https://github.com/sgl-project/sglang/pull/7420
feat(oai refactor): Replace openai_api with entrypoints/openai by @CatherineSue in https://github.com/sgl-project/sglang/pull/7351
Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support by @lifuhuang in https://github.com/sgl-project/sglang/pull/7412
refactor(test): reorganize OpenAI test file structure by @CatherineSue in https://github.com/sgl-project/sglang/pull/7408
[minor] simplify the TokenToKVPoolAllocator by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7414
Tiny add logging for GC by @fzyzcjy in https://github.com/sgl-project/sglang/pull/7406
FlashInfer NVFP4 MoE with EP & 2-stream shared expert by @trevor-m in https://github.com/sgl-project/sglang/pull/7327
Remove copy after bmm by @ispobock in https://github.com/sgl-project/sglang/pull/7441
Fix torch compile run by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/7391
[misc] Add PD service discovery support in router by @slin1237 in https://github.com/sgl-project/sglang/pull/7361
add fused moe config for qwen3 in triton3.3.1 by @yizhang2077 in https://github.com/sgl-project/sglang/pull/7445
Fix CUDA Graph Check under Deepep with DP FFN by @ch-wan in https://github.com/sgl-project/sglang/pull/7451
Update hyperparameter_tuning.md by @merrymercy in https://github.com/sgl-project/sglang/pull/7454
feat: integrate deepgemm into EPMoE by @xutizhou in https://github.com/sgl-project/sglang/pull/6821
Solve docker build failed in the virtual machine by @kkHuang-amd in https://github.com/sgl-project/sglang/pull/7290
Fix a bug in BatchTokenIDOut & Misc style and dependency updates by @merrymercy in https://github.com/sgl-project/sglang/pull/7457
[CI] Upgrade mooncake to 0.3.4.post1 to fix 8 gpu tests by @ShangmingCai in https://github.com/sgl-project/sglang/pull/7472
Fix prefill OOM due to wrong token calculation when page > 1 by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7397
feat(func_call): Add more check in BaseFormatDetector.parse_streaming_increment by @CatherineSue in https://github.com/sgl-project/sglang/pull/7479
Fix dtype for idle input in spec decoding by @ch-wan in https://github.com/sgl-project/sglang/pull/7456
update mooncake in dockerfile by @hnyls2002 in https://github.com/sgl-project/sglang/pull/7480
kvcache io kernels and test case by @xiezhq-hermann in https://github.com/sgl-project/sglang/pull/7382
[perf] slightly imporve DeepSeek-R1-FP4 TP8 by @Alcanderian in https://github.com/sgl-project/sglang/pull/7481
Quick fix for DeepGemm requant to also cover MTP. by @pyc96 in https://github.com/sgl-project/sglang/pull/7378
Support weight loading without mmap by @guoyuhong in https://github.com/sgl-project/sglang/pull/7469
ci: Revert openai_server related tests in AMD suites by @CatherineSue in https://github.com/sgl-project/sglang/pull/7449
Perormance: Enable cuda graph for dp idle batch by @u4lr451 in https://github.com/sgl-project/sglang/pull/7269
bugfix: Prevent global mutation of conv.stop_str across requests by @huangtingwei9988 in https://github.com/sgl-project/sglang/pull/7347
Fix RequestValidationError response format by @CatherineSue in https://github.com/sgl-project/sglang/pull/7487
Fix MTP with Deepseek R1 Fp4 by @pyc96 in https://github.com/sgl-project/sglang/pull/7376
chore: bump sgl-kernel v0.2.0 by @zhyncs in https://github.com/sgl-project/sglang/pull/7490
chore: bump v0.4.8 by @zhyncs in https://github.com/sgl-project/sglang/pull/7493

New Contributors

@futrime made their first contribution in https://github.com/sgl-project/sglang/pull/6106
@faradawn made their first contribution in https://github.com/sgl-project/sglang/pull/6824
@liquanfeng made their first contribution in https://github.com/sgl-project/sglang/pull/7093
@p12tic made their first contribution in https://github.com/sgl-project/sglang/pull/6026
@byjiang1996 made their first contribution in https://github.com/sgl-project/sglang/pull/7140
@zhijian-liu made their first contribution in https://github.com/sgl-project/sglang/pull/7191
@DiweiSun made their first contribution in https://github.com/sgl-project/sglang/pull/7265
@zhuyijie88 made their first contribution in https://github.com/sgl-project/sglang/pull/7022
@pyc96 made their first contribution in https://github.com/sgl-project/sglang/pull/7164
@ch-tiger1 made their first contribution in https://github.com/sgl-project/sglang/pull/7301
@Atream made their first contribution in https://github.com/sgl-project/sglang/pull/7242
@LLLL114 made their first contribution in https://github.com/sgl-project/sglang/pull/7159
@key4ng made their first contribution in https://github.com/sgl-project/sglang/pull/7329
@ehuaa made their first contribution in https://github.com/sgl-project/sglang/pull/7394

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.4.7...v0.4.8

Source: README.md, updated 2025-06-24

SGLang Files

SGLang is a fast serving framework for large language models

Highlights

OpenAI-Compatible Server Refactor

DeepSeek R1 FP4 on Blackwell GPU

What's Changed

New Contributors

SGLang Files

SGLang is a fast serving framework for large language models

Get an email when there's a new version of SGLang

Highlights

OpenAI-Compatible Server Refactor

DeepSeek R1 FP4 on Blackwell GPU

What's Changed

New Contributors