TensorRT LLM - Browse /v1.1.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-12-11	125.0 kB	0
v1.1.0 source code.tar.gz	2025-12-11	338.9 MB	0
v1.1.0 source code.zip	2025-12-11	343.1 MB	0
Totals: 3 Items		682.2 MB	0

Know Issue

If users create project with tensorrt-llm==1.1.0 in pyproject.toml file as dependency as below: toml dependencies = [ "tensorrt-llm==1.1.0", ]

when users install project dependencies with command uv sync, error will happend with message: No solution found when resolving dependencies for split (markers: python_full_version >= '3.13' and sys_platform == 'darwin'): ╰─▶ Because patchelf==0.18.0.0 was yanked (reason: https://github.com/mayeut/patchelf-pypi/issues/87) and tensorrt-llm==1.1.0 depends on patchelf==0.18.0, we can conclude that tensorrt-llm==1.1.0 cannot be used. And because your project depends on tensorrt-llm==1.1.0, we can conclude that your project's requirements are unsatisfiable.".

That's because patchelf 0.18.0 was yanked by author.

A valid work around for this issue is to add block in pyproject.toml: toml [tool.uv] override-dependencies = [ "patchelf==0.17.2.4", ]

Key Features and Enhancements

Model Support
Add model supports of GPT-OSS, Hunyuan-Dense (contribution from @sorenwu), Hunyuan-MoE (contribution from @qianbiaoxiang), Seed-OSS (contribution from @Nekofish-Li).
Features
Connector API: Introduced a new KV Cache Connector API for state transfer in disaggregated serving.
Reuse & Offloading: Enabled KV cache reuse for MLA (Multi-Head Latent Attention) and added examples for host offloading.
Salting: Implemented KV cache salting for secure cache reuse.
Guided Decoding Integration: Enabled guided decoding to work in conjunction with speculative decoding (including 2-model and draft model chunked prefill).
Eagle: Added multi-layer Eagle support and optimizations.
CuteDSL: Integrated CuteDSL NVFP4 grouped GEMM for Blackwell.
B300/GB300: Added support for B300/GB300.
Documentation
Deployment Guides: Added comprehensive deployment guides for GPT-OSS, DeepSeek-R1, and VDR 1.0.
Feature Documentation: Created new documentation for KV Cache Connector, LoRA feature usage, and AutoDeploy.
Tech Blogs: Published blogs on “Combining Guided Decoding and Speculative Decoding” and “ADP Balance Strategy”.
Quick Start: Refined Quick Start guides with new links to ModelOpt checkpoints and updated installation steps (Linux/Windows).
API Reference: Enhanced LLM API documentation by explicitly labeling stable vs. unstable APIs.
Performance: Updated online benchmarking documentation and performance overview pages.
Examples: Refined Slurm examples and added K2 tool calling examples.
Infrastructure Changes
The base Docker image for TensorRT-LLM: nvcr.io/nvidia/pytorch:25.10-py3.
The base Docker image for TensorRT-LLM Backend: nvcr.io/nvidia/tritonserver:25.10-py3.
The dependent public PyTorch version: 2.9.0.
The dependent NVIDIA ModelOpt version: 0.37.
The dependent xgrammar version: 0.1.25.
The dependent transformers version: 4.56.0.
The dependent NIXL version: 0.5.0.
API Changes
Breaking Change: The C++ TRTLLM sampler is now enabled by default, replacing the legacy implementation.
KV Cache Connector API: Introduced a new KV Cache Connector API.
Standardized topk logprob returns across TRT and PyTorch backends.
Added stable labels to arguments in the LLM class to better indicate API stability.
Wait and Cancel API: Added tests and support for handling non-existent and completed request cancellations in the executor.
Fixed multiple Issues
illegal memory access, weight loading issues for DeepSeek-R1 W4A8, CUDA graph warmup issues with speculative decoding, memory leaks
Known Issues
GB300 Multi-Node: Support for GB300 in multi-node configurations is currently in beta and not fully validated in this release. GB300 multi-node configurations have been validated in 1.2.0rc4+.

What's Changed

[None][chore] Bump version to 1.1.0rc0 by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/6651
[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter by @amitz-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6510
[None][test] correct test-db context for perf yaml file by @ruodil in https://github.com/NVIDIA/TensorRT-LLM/pull/6686
[None] [feat] Add model gpt-oss by @hlu1 in https://github.com/NVIDIA/TensorRT-LLM/pull/6645
[https://nvbugs/5409414][fix] fix Not registered specs by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6660
[None][feat] : Add FP8 context MLA support for SM120 by @peaceh-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6059
[TRTLLM-6092][doc] Add LoRA feature usage doc by @shaharmor98 in https://github.com/NVIDIA/TensorRT-LLM/pull/6603
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/6300
[TRTLLM-6881][feat] Include attention dp rank info with KV cache events by @pcastonguay in https://github.com/NVIDIA/TensorRT-LLM/pull/6563
[None][infra] Fix guardwords by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6711
[None][package] Pin cuda-python version to >=12,<13 by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/6702
[None][doc] Add deployment guide section to the official doc website by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/6669
[None][fix] disagg ctx pp4 + gen pp4 integ test by @raayandhar in https://github.com/NVIDIA/TensorRT-LLM/pull/6489
[None][feat] Clean up ngram auto mode, add max_concurrency to configs by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/6676
[None][chore] Remove py_executor from disagg gh team by @pcastonguay in https://github.com/NVIDIA/TensorRT-LLM/pull/6716
[https://nvbugs/5423962][fix] Address broken links by @chenopis in https://github.com/NVIDIA/TensorRT-LLM/pull/6531
[None][fix] Migrate to new cuda binding package name by @tongyuantongyu in https://github.com/NVIDIA/TensorRT-LLM/pull/6700
[https://nvbugs/5410687][fix] Hopper w4a8 groupwise MoE interleave by @symphonylyh in https://github.com/NVIDIA/TensorRT-LLM/pull/6708
[None][feat] Add NCCL Symmetric Integration for All Reduce by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/4500
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default by @dcampora in https://github.com/NVIDIA/TensorRT-LLM/pull/6216
[TRTQA-2920][fix] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6719
[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test by @ruodil in https://github.com/NVIDIA/TensorRT-LLM/pull/6685
[TRTLLM-6744][feat] Remove input_sf swizzle for module WideEPMoE by @StudyingShao in https://github.com/NVIDIA/TensorRT-LLM/pull/6231
[None][fix] Fix unnecessary GPU synchronization in torch sampler caused by incorrect tensor reference by @zhanghaotong in https://github.com/NVIDIA/TensorRT-LLM/pull/6626
[TRTLLM-6854][feat] Enable guided decoding with disagg serving by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/6704
[TRTLLM-5252][fix] Propagate mapping to intermediate layers by @2ez4bz in https://github.com/NVIDIA/TensorRT-LLM/pull/6611
[None][test] fix yml condition error under qa folder by @ruodil in https://github.com/NVIDIA/TensorRT-LLM/pull/6734
[None][doc] Add doc for multimodal feature support matrix by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/6619
[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell by @limin2021 in https://github.com/NVIDIA/TensorRT-LLM/pull/6616
[https://nvbugs/5436461][infra] Adjust free_gpu_memory_fraction of test_eagle3 to prevent OOM on CI by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/6631
[None][refactor] Combine resmooth_to_fp8_e8m0 and transform_sf_into_required_layout by @yuxianq in https://github.com/NVIDIA/TensorRT-LLM/pull/6654
[https://nvbugs/5437106][fix] Fix llama4 scout TRTLLM attn_backend by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6690
[None][fix] Remove lock related typo in py_executor by @lancelly in https://github.com/NVIDIA/TensorRT-LLM/pull/6653
[None][feat] move kv cache measure into transfer session by @zhengd-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6633
[None][fix]revert kvcache transfer by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/6709
[TRTLLM-6650][fix] Enhance CUDA graph + Beam search to correctly handle padding by @stnie in https://github.com/NVIDIA/TensorRT-LLM/pull/6665
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/6184
[None][feat] Optimize CUDA graph memory usage for spec decode cases by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/6718
[TRTLLM-7025] [infra] Reorganize CODEOWNERS to rectify examples mapping by @venkywonka in https://github.com/NVIDIA/TensorRT-LLM/pull/6762
[None][doc] Move AutoDeploy README.md to torch docs by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6528
[None][fix] WAR GPT OSS on H20 with Triton MOE by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/6721
[TRTLLM-6420][feat] add support for Eclairv2 model - cherry-pick changes and minor fix by @yibinl-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/6493
[None][feat] Core Metrics Implementation by @hcyezhang in https://github.com/NVIDIA/TensorRT-LLM/pull/5785
[https://nvbugs/5398180][feat] Improve Llama4 performance for small max_seqlen cases by @nv-yilinf in https://github.com/NVIDIA/TensorRT-LLM/pull/6306
[TRTLLM-6637][feat] Resolve KV cache divergence issue by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6628
[None][infra] Waive test main 0808 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6751
[#5048][enhance] AutoDeploy: Optimize prepare_inputs by @galagam in https://github.com/NVIDIA/TensorRT-LLM/pull/6634
[None][chore] Dead code elimination, we no longer record/fetch through WindowBlockManager:: mContextBlocksByHash by @eopXD in https://github.com/NVIDIA/TensorRT-LLM/pull/6249
[TRTLLM-6174][feat] Enable FP32 mamba ssm cache by @shaharmor98 in https://github.com/NVIDIA/TensorRT-LLM/pull/6574
[https://nvbugs/5444937][fix] Fixing kv_cache_event unit test by @pcastonguay in https://github.com/NVIDIA/TensorRT-LLM/pull/6753
[TRTLLM-6823][doc] Add checkpoint refactor docs by @shaharmor98 in https://github.com/NVIDIA/TensorRT-LLM/pull/6592
[None][feat] Support SharedTensor on MultimodalParams by @yechank-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/6254
[None][feat] improve dataloading for benchmark_dataset by using batch… by @zerollzeng in https://github.com/NVIDIA/TensorRT-LLM/pull/6548
[https://nvbugs/5431127][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6736
[None][fix] fix same pp disagg by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/6730
[None][feat] Add gpt-oss GSM8K test. by @Tracin in https://github.com/NVIDIA/TensorRT-LLM/pull/6732
[None][test] Test trtllm-bench AD vs, PT BEs on H100 single gpu by @MrGeva in https://github.com/NVIDIA/TensorRT-LLM/pull/6487
[TRTLLM-5633][infra] Force set changed file diff to empty string for post-merge CI by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/6777
[None][chore] remove closed bugs by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6772
[None][infra] Waive failed tests on main 0811 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6778
fix: Ensure that Python stub generation works against libnvidia-ml stubs by @MartinMarciniszyn in https://github.com/NVIDIA/TensorRT-LLM/pull/6188
[TRTLLM-5532][feat] store the block of context request into kv cache by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/6683
[None][doc] Add K2 tool calling examples by @lancelly in https://github.com/NVIDIA/TensorRT-LLM/pull/6667
[None][infra] Unwaive an updated case to test by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6791
[None][chore] always try-catch when clear build folder in build_wheel.py by @zhenhuaw-me in https://github.com/NVIDIA/TensorRT-LLM/pull/6748
[TRTLLM-6812][feat] Add standardized GitHub issue templates and disable blank issues by @venkywonka in https://github.com/NVIDIA/TensorRT-LLM/pull/6494
[None][fix] Refactoring to avoid circular import when importing torch models by @rakib-hasan in https://github.com/NVIDIA/TensorRT-LLM/pull/6720
[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically by @achartier in https://github.com/NVIDIA/TensorRT-LLM/pull/6763
[https://nvbugs/5385987][fix] Fix Qwen2 quantization issue by pinning transformers version by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/6673
[None][perf] Improve the performance of online EPLB on Hopper by better overlapping by @jinyangyuan-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/6624
[https://nvbugs/5441438][fix] Set correct draft length for the cuda graph dummy request by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6701
[TRTLLM-6854][feat] Enable guided decoding with CUDA graph padding and draft model chunked prefill by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/6774
[#4403][autodeploy] Refactor: Move more transformations to new inf optimizer, Add quantization_source to factory interface by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6760
[None][feat] CUTLASS MoE FC2+Finalize fusion by @sklevtsov-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/3294
[TRTLLM-6906][chore] Using pybind to bind functions in thop/attentionOp by @lancelly in https://github.com/NVIDIA/TensorRT-LLM/pull/6745
[None][fix] Fix attention dp log by @Shunkangz in https://github.com/NVIDIA/TensorRT-LLM/pull/6570
[None][fix] fix ci by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/6814
[TRTQA-2920][chore] improve hang tests by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6781
[https://nvbugs/5438869][fix] Set nvfp4 expert w1 w3 weight scale to the same value if they're not by @jhaotingc in https://github.com/NVIDIA/TensorRT-LLM/pull/6656
[None][feat] Add GPT OSS support for AutoDeploy by @nvchenghaoz in https://github.com/NVIDIA/TensorRT-LLM/pull/6641
[#6187][feat] add LayerNorm module by @Funatiq in https://github.com/NVIDIA/TensorRT-LLM/pull/6625
[None][refactor] Simplify decoder state initialization by @Funatiq in https://github.com/NVIDIA/TensorRT-LLM/pull/6559
[TRTLLM-7008][fix] fix wideEP weights loading and args by @dongxuy04 in https://github.com/NVIDIA/TensorRT-LLM/pull/6789
[None][fix] Refactoring input prep to allow out-of-tree models by @rakib-hasan in https://github.com/NVIDIA/TensorRT-LLM/pull/6497
feat: Support custom repo_dir for SLURM script by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/6546
[None][fix] Pre-allocate workspaces for DeepGEMM MoE to avoid frequent cudaFree/cudaMalloc by @lfr-0531 in https://github.com/NVIDIA/TensorRT-LLM/pull/6811
[TRTLLM-6772][feat] Multimodal benchmark_serving support by @yechank-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/6622
[https://nvbugs/5452167][fix] Fix ngram padding issue by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/6837
[#6530][fix] Fix script when using calibration tensors from modelopt by @achartier in https://github.com/NVIDIA/TensorRT-LLM/pull/6803
[https://nvbugs/5412456][fix] Fix an illegal instruction was encountered by @zhou-yuxin in https://github.com/NVIDIA/TensorRT-LLM/pull/6776
[None][feat] DeepEP LL combine FP4 by @yilin-void in https://github.com/NVIDIA/TensorRT-LLM/pull/6822
[TRTLLM-4501][feat] AutoTuner tuning config refactor and valid tactic generalization. by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/6545
[TRTLLM-7030][fix] Refactor the example doc of dist-serving by @Shixiaowei02 in https://github.com/NVIDIA/TensorRT-LLM/pull/6766
[TRTLLM-7093][fix] the perf regression to cvt_fp4 kernels by @PerkzZheng in https://github.com/NVIDIA/TensorRT-LLM/pull/6851
[https://nvbugs/5412885][doc] Add the workaround doc for H200 OOM by @zhenhuaw-me in https://github.com/NVIDIA/TensorRT-LLM/pull/6853
[https://nvbugs/5378031] [feat] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend by @rosenrodt in https://github.com/NVIDIA/TensorRT-LLM/pull/6200
[None][infra] Waive failed cases on main by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6863
[None][feat] Support running heterogeneous model execution for Nemotron-H by @danielafrimi in https://github.com/NVIDIA/TensorRT-LLM/pull/6866
[https://nvbugs/5302040][feat] Add whisper support (Bert Attention on SM100 and GPTAttention for cross attention on SM100) by @wu6u3tw in https://github.com/NVIDIA/TensorRT-LLM/pull/5527
[https://nvbugs/5394685][fix] the bug with spec-decoding + SWA && an accuracy issue related to 2CTA MLA by @PerkzZheng in https://github.com/NVIDIA/TensorRT-LLM/pull/6834
[https://nvbugs/5410399][chore] Unwaive mtp llmapi test by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/6833
[None][fix] max_num_sequences argument in nanobind by @Linda-Stadter in https://github.com/NVIDIA/TensorRT-LLM/pull/6862
[None][feat] Add test for speculative rejection sampler (2-model) by @IzzyPutterman in https://github.com/NVIDIA/TensorRT-LLM/pull/6542
[None][chore] fix markdown format for the deployment guide by @zhenhuaw-me in https://github.com/NVIDIA/TensorRT-LLM/pull/6879
[None][feat] Add support for Hopper MLA chunked prefill by @jmydurant in https://github.com/NVIDIA/TensorRT-LLM/pull/6655
[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6735
[https://nvbugs/5427043][fix] request length exceeds max_num_tokens by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/6821
[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE by @StudyingShao in https://github.com/NVIDIA/TensorRT-LLM/pull/6784
[None][doc] update moe support matrix for DS R1 by @litaotju in https://github.com/NVIDIA/TensorRT-LLM/pull/6883
[None][test] Add perf-sweep scripts by @chenfeiz0326 in https://github.com/NVIDIA/TensorRT-LLM/pull/6738
[TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands by @Shixiaowei02 in https://github.com/NVIDIA/TensorRT-LLM/pull/6323
[https://nvbugs/5445466][fix] fix deepseek r1 hang by not enabling mnnvl by default by @pengbowang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6860
[TRTLLM-6853][feat] refactor deepseekv3 model by @kris1025 in https://github.com/NVIDIA/TensorRT-LLM/pull/6698
[None][fix] Fix python-only build that uses TRTLLM_USE_PRECOMPILED by @jiaganc in https://github.com/NVIDIA/TensorRT-LLM/pull/6825
[None][infra] Waive failed cases on main 08/14 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6902
[TRTLLM-5966][feat] Helix: extend mapping to support different CP types by @MatthiasKohl in https://github.com/NVIDIA/TensorRT-LLM/pull/6816
[https://nvbugs/5450262][fix] Fix unsupported alltoall use case by @bobboli in https://github.com/NVIDIA/TensorRT-LLM/pull/6882
[https://nvbugs/5455651][fix] Make ngram use XQA attention on Blackwell by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/6873
[https://nvbugs/5441714][chore] remove skip on disagg n-gram test by @raayandhar in https://github.com/NVIDIA/TensorRT-LLM/pull/6872
[None] [feat] Add Tencent HunYuanMoEV1 model support by @qianbiaoxiang in https://github.com/NVIDIA/TensorRT-LLM/pull/5521
[None][chore] Add tests for non-existent and completed request cancellation by @achartier in https://github.com/NVIDIA/TensorRT-LLM/pull/6840
[None][doc] Update gpt-oss doc on MoE support matrix by @hlu1 in https://github.com/NVIDIA/TensorRT-LLM/pull/6908
[https://nvbugs/5394685][fix] using static scheduler 2CTA MLA as WAR for an accuracy issue by @PerkzZheng in https://github.com/NVIDIA/TensorRT-LLM/pull/6896
[https://nvbugs/5437106][fix] Add L4 Scout benchmarking WAR option in deploy guide by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6829
[None][fix] Fix the issue of responsibility boundary between the assert and tllmException files by @Fan-Yunfan in https://github.com/NVIDIA/TensorRT-LLM/pull/6723
[None][fix] Correct reporting of torch_dtype for ModelConfig class. by @FrankD412 in https://github.com/NVIDIA/TensorRT-LLM/pull/6800
[None][fix] Fix perfect router. by @bobboli in https://github.com/NVIDIA/TensorRT-LLM/pull/6797
[https://nvbugs/5415862][fix] Update cublas as 12.9.1 and cuda memory alignment as 256 by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/6501
[None][fix] Update tests to use standardized uppercase backend identifiers by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6921
[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/6836
[None][doc] Modify the description for mla chunked context by @jmydurant in https://github.com/NVIDIA/TensorRT-LLM/pull/6929
[None][chore] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6914
[None][chore] add a EditorConfig config by @zhenhuaw-me in https://github.com/NVIDIA/TensorRT-LLM/pull/6897
[https://nvbugs/5451373][fix] : Fix the accuracy issue when using FP8 context MLA by @peaceh-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6881
[https://nvbugs/5405041][fix] Update wide-ep doc by @qiaoxj07 in https://github.com/NVIDIA/TensorRT-LLM/pull/6933
[None] [chore] Mamba cache in separate file by @tomeras91 in https://github.com/NVIDIA/TensorRT-LLM/pull/6796
[https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6858
[https://nvbugs/5394685][fix] proper fix for the accuracy issue in 2CTA MLA kernels by @PerkzZheng in https://github.com/NVIDIA/TensorRT-LLM/pull/6941
[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 by @yifeizhang-c in https://github.com/NVIDIA/TensorRT-LLM/pull/6537
[None][test] Add accuracy evaluation for AutoDeploy by @ajrasane in https://github.com/NVIDIA/TensorRT-LLM/pull/6764
[None][fix] Make TP working for Triton MOE (in additional to EP we are using) by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/6722
[TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow by @Yuening-wa in https://github.com/NVIDIA/TensorRT-LLM/pull/6629
[https://nvbugs/5401114][fix] Unwaive Gemma3 tests by @brb-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6952
[None][chore] Bump version to 1.1.0rc1 by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/6953
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options by @dcampora in https://github.com/NVIDIA/TensorRT-LLM/pull/6831
[None][fix] Skip Topk if 0 by @IzzyPutterman in https://github.com/NVIDIA/TensorRT-LLM/pull/6934
[None][fix] Fix: Using RAII to automatically manage the allocation and release of va_list for potential resource leak by @Fan-Yunfan in https://github.com/NVIDIA/TensorRT-LLM/pull/6758
[None][feat] Support Yarn on Qwen3 by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/6785
[None][feat] Add single block version renormalized routing kernel by @ChristinaZ in https://github.com/NVIDIA/TensorRT-LLM/pull/6756
[None][infra] Waive failed cases in main branch by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6951
[https://nvbugs/5390853][fix] Fix _test_openai_lora.py - disable cuda graph by @amitz-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6965
[https://nvbugs/5451028][fix] Constrain NemotronSuper test parameters to prevent OOMs by @Naveassaf in https://github.com/NVIDIA/TensorRT-LLM/pull/6970
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/6945
[None][doc] Update gpt oss doc by @bobboli in https://github.com/NVIDIA/TensorRT-LLM/pull/6954
[None] [feat] Support accurate device iter time by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/6906
[TRTLLM-7030][fix] uppercase def value in pd-config by @Shixiaowei02 in https://github.com/NVIDIA/TensorRT-LLM/pull/6981
[None] [fix] Fix the macro name by @ChristinaZ in https://github.com/NVIDIA/TensorRT-LLM/pull/6983
[None][infra] Waive failed tests on main 0818 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6992
[None][chore] Remove duplicate test waives by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/6998
[None][fix] Clean up linking to CUDA stub libraries in build_wheel.py by @MartinMarciniszyn in https://github.com/NVIDIA/TensorRT-LLM/pull/6823
[None][infra] Cherry-pick [#6836] from main branch and improve SSH connection (#6971) by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7005
[TRTLLM-7158][feat] Introduce sampler options in trtllm bench by @dcampora in https://github.com/NVIDIA/TensorRT-LLM/pull/6855
[None][infra] Enable accuracy test for mtp and chunked prefill by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/6314
[None][autodeploy] Doc: fix link path in trtllm bench doc by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7007
[https://nvbugs/5371480][fix] Enable test_phi3_small_8k by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/6938
[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/6741
[None][refactor] Refactor Torch Compile Backend, MoeLoadBalancer and warmup Logic by @yizhang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6615
[None] [infra] stricter coderabbit pr title generation instructions by @venkywonka in https://github.com/NVIDIA/TensorRT-LLM/pull/6918
[TRTLLM-6960][fix] enable scaled_mm tests by @dc3671 in https://github.com/NVIDIA/TensorRT-LLM/pull/6936
[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell by @lfr-0531 in https://github.com/NVIDIA/TensorRT-LLM/pull/6710
[TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] by @fredricz-20070104 in https://github.com/NVIDIA/TensorRT-LLM/pull/6939
[https://nvbugs/5454875][ci] Unwaive Mistral Small 3.1 test by @2ez4bz in https://github.com/NVIDIA/TensorRT-LLM/pull/7011
[TRTLLM-6541][test] Add NIM Related Cases Part 1 by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/6684
[https://nvbugs/5458798][fix] Relaxed test threshold, added documentation by @MrGeva in https://github.com/NVIDIA/TensorRT-LLM/pull/6997
[None][opt] Add batch wait timeout in fetching requests by @Shunkangz in https://github.com/NVIDIA/TensorRT-LLM/pull/6923
[None][chore] Remove closed bugs by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6969
[None][fix] acceptance rate calculation fix in benchmark_serving by @zerollzeng in https://github.com/NVIDIA/TensorRT-LLM/pull/6746
[None] [doc] Add more documents for large scale EP by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/7029
[None] [chore] Update wide-ep genonly scripts by @qiaoxj07 in https://github.com/NVIDIA/TensorRT-LLM/pull/6995
[TRTLLM-7263][fix] Prevent recreation of cublas handles in lora_grouped_gemm every call by @amitz-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6968
[https://nvbugs/5458874][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test by @tomeras91 in https://github.com/NVIDIA/TensorRT-LLM/pull/6996
[https://nvbugs/5455140][fix] unwaive DSR1-fp4 throughput_tp8 by @lfr-0531 in https://github.com/NVIDIA/TensorRT-LLM/pull/7022
[None][chore] Remove duplicate test waives by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7044
[None][infra] Waive failed tests on main 08/19 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7037
[None][feat] Use Separate QKV Input Layout for Context MLA by @zhhuang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6538
[https://nvbugs/5444937][chore] Fixing KV events tests by @pcastonguay in https://github.com/NVIDIA/TensorRT-LLM/pull/7004
[https://nvbugs/5451296][bug] Cherry-pick [#7017] from release/1.0 branch by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7043
[None][fix] Accommodate Phi3/4 to work with ModelOpt's FP8 ckpts in Torch by @moraxu in https://github.com/NVIDIA/TensorRT-LLM/pull/6761
[None][fix] Fix assertion errors of quantization when using online EPLB by @jinyangyuan-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/6922
[None][autodeploy] Add group attention pattern that supports attention masks by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7054
[None][chore] unwaive test_disaggregated_genbs1 by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6944
[None][fix] fix llmapi import error by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/7030
[TRTLLM-7326][feat] Add standalone multimodal encoder by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/6743
[None][infra] update feature_combination_matrix of disaggregated and chunked prefill by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/6661
[TRTLLM-7205][feat] add llama4 tp4 tests by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6989
[None][infra] "[TRTLLM-6960][fix] enable scaled_mm tests (#6936)" by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7059
[TRTLLM-6341][chore] Preliminary refactors on the kv cache manager before supporting swa kv cache reuse by @eopXD in https://github.com/NVIDIA/TensorRT-LLM/pull/6767
[None][fix] fix scaffolding dynasor test by @dc3671 in https://github.com/NVIDIA/TensorRT-LLM/pull/7070
[None][chore] Update namelist in blossom-ci by @karljang in https://github.com/NVIDIA/TensorRT-LLM/pull/7015
[None][ci] move unittests to sub-directories by @Funatiq in https://github.com/NVIDIA/TensorRT-LLM/pull/6635
[None][infra] Waive failed tests on main branch 8/20 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7092
[None][fix] Fix W4A8 MoE kernel issue by @yuhyao in https://github.com/NVIDIA/TensorRT-LLM/pull/7072
[TRTLLM-7348] [feat] Enable Cross-Attention to use XQA kernels for Whisper by @DomBrown in https://github.com/NVIDIA/TensorRT-LLM/pull/7035
[None][chore] Only check the bindings lib for current build by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7026
[None][ci] move some tests of b200 to post merge by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7093
[https://nvbugs/5457489][fix] unwaive some tests by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/6991
[TRTLLM-6771][feat] Support MMMU for multimodal models by @yechank-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/6828
[None][fix] Fix llama4 multimodal by skipping request validation by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/6957
[None][infra] Upgrade UCX to v1.19.x and NIXL to 0.5.0 by @BatshevaBlack in https://github.com/NVIDIA/TensorRT-LLM/pull/7024
[None][fix] update accelerate dependency to 1.7+ for AutoDeploy by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7077
[None][fix] Fix const modifier inconsistency in log function declaration/implementation by @Fan-Yunfan in https://github.com/NVIDIA/TensorRT-LLM/pull/6679
[None][chore] waive failed cases on H100 by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7084
[None][fix] Use safeInitRowMax instead of fp32_lowest to avoid NaN by @lowsfer in https://github.com/NVIDIA/TensorRT-LLM/pull/7087
[https://nvbugs/5443039][fix] Fix AutoDeploy pattern matcher for torch 2.8 by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7076
[https://nvbugs/5437405][fix] qwen3 235b eagle3 ci by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/7000
[None][doc] Update gpt-oss deployment guide to latest release image by @farshadghodsian in https://github.com/NVIDIA/TensorRT-LLM/pull/7101
[https://nvbugs/5392414] [fix] Add customized default routing method by @ChristinaZ in https://github.com/NVIDIA/TensorRT-LLM/pull/6818
[https://nvbugs/5453827][fix] Fix RPATH of th_common shared library to find pip-installed NCCL by @tongyuantongyu in https://github.com/NVIDIA/TensorRT-LLM/pull/6984
[None][chore] No-op changes to support context parallelism in disaggregated serving later by @brb-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7063
[https://nvbugs/5394409][feat] Support Mistral Small 3.1 multimodal in Triton Backend by @dbari in https://github.com/NVIDIA/TensorRT-LLM/pull/6714
[None][infra] Waive failed case for main branch 08/21 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7129
[#4403][refactor] Move fusion, kvcache, and compile to modular inference optimizer by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7057
[None][perf] Make finalize fusion part of the tactic selection logic by @djns99 in https://github.com/NVIDIA/TensorRT-LLM/pull/6915
[None][chore] Mass integration of release/1.0 by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/6864
[None][docs] update stale link for AutoDeploy by @suyoggupta in https://github.com/NVIDIA/TensorRT-LLM/pull/7135
[TRTLLM-6825][fix] Update lora for phi4-mm by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/6817
[None][chore] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7109
[None][fix] Fix mm_placholder_counts extraction issue. by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/7118
[TRTLLM-7155][feat] Unify sampler handle logits implementation. by @dcampora in https://github.com/NVIDIA/TensorRT-LLM/pull/6867
[TRTLLM-5801][infra] Add more RTX Pro 6000 test stages by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/5126
[None][feat] Enable nanobind as the default binding library by @Linda-Stadter in https://github.com/NVIDIA/TensorRT-LLM/pull/6608
[TRTLLM-7321][doc] Add GPT-OSS Deployment Guide into official doc site by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/7143
[TRTLLM-7245][feat] add test_multi_nodes_eval tests by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7108
[None][ci] move all B200 TensorRT test cases to post merge by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7165
[None][chore] Bump version to 1.1.0rc2 by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7167
[#7136][feat] trtllm-serve + autodeploy integration by @suyoggupta in https://github.com/NVIDIA/TensorRT-LLM/pull/7141
[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H by @tomeras91 in https://github.com/NVIDIA/TensorRT-LLM/pull/6334
[None][refactor] Simplify decoder state initialization for speculative decoding by @Funatiq in https://github.com/NVIDIA/TensorRT-LLM/pull/6869
[None][feat] Deepseek: Start Eagle work by @IzzyPutterman in https://github.com/NVIDIA/TensorRT-LLM/pull/6210
[None][fix] Correct KV cache percentage report out. by @FrankD412 in https://github.com/NVIDIA/TensorRT-LLM/pull/7102
[None] [feat] nsys profile output kernel classifier by @gracehonv in https://github.com/NVIDIA/TensorRT-LLM/pull/7020
[None][fix] Waive test by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7185
[https://nvbugs/5467232][fix] Fix load_torch_hf_lora to override lora_config.trtllm_modules_to_hf_modules with default only when it has no value by @amitz-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7132
[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP by @dongxuy04 in https://github.com/NVIDIA/TensorRT-LLM/pull/6973
[TRTLLM-7321][doc] Refine GPT-OSS doc by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/7180
[None][infra] Prepare for single GPU GB200 test pipeline by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7073
[None][chore] Enable auto deploy accuracy test in CI by @ajrasane in https://github.com/NVIDIA/TensorRT-LLM/pull/7179
[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests by @Funatiq in https://github.com/NVIDIA/TensorRT-LLM/pull/6754
[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7074
[TRTLLM-7096][infra] Testing cache transmission functionality in Python by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7025
[None][feat] add gpt-osss tests to sanity list by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7158
[None][chore] cherry-pick 6940 by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7097
[None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger JIT ahead of time. by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/7113
[None][ci] waive test_mamba2_chunk_scan_combined_prefill_chunking[seqlens1-8] by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7194
[None][test] add l20 specific qa test list by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/7067
[None][fix] Fix MoE load balancer config loading by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7150
[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests by @amitz-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7033
[None][chore] remove CLI support for mamba cache dtype setting by @shaharmor98 in https://github.com/NVIDIA/TensorRT-LLM/pull/7119
[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/6846
[None][infra] Waive failed tests on main branch by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7201
[https://nvbugs/5440241][fix] Fix 70B GSM8K Accuracy drop by @chenfeiz0326 in https://github.com/NVIDIA/TensorRT-LLM/pull/6967
[None][fix] Update to pull LLM from a central location. by @FrankD412 in https://github.com/NVIDIA/TensorRT-LLM/pull/6458
[None][chore] Refactored the handle logits pp communication by @dcampora in https://github.com/NVIDIA/TensorRT-LLM/pull/7154
[TRTLLM-7319][perf] Fuse slicing into MoE. by @bobboli in https://github.com/NVIDIA/TensorRT-LLM/pull/6728
[None][fix][AutoDeploy] canonicalize_graph before shape prop for consistent state_dict by @lucaslie in https://github.com/NVIDIA/TensorRT-LLM/pull/7223
[TRTLLM-6342][feat] TP Sharding read from the model config by @greg-kwasniewski1 in https://github.com/NVIDIA/TensorRT-LLM/pull/6972
[None][doc] update feature_combination_matrix doc by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/6691
[None][test] add kv cache size in bench metric and fix failed cases by @ruodil in https://github.com/NVIDIA/TensorRT-LLM/pull/7160
[None][chore] Create PyExecutor from TorchLlmArgs Part 1 by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/7105
[https://nvbugs/5452463][doc] update disagg doc about UCX_MAX_RNDV_RAILS by @zhengd-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7205
[None][feat] Skip prefetching consolidated safetensors when appropriate by @2ez4bz in https://github.com/NVIDIA/TensorRT-LLM/pull/7013
[None] [fix] improve kvcache allocation in PyTorch runtime by @qixiang-99 in https://github.com/NVIDIA/TensorRT-LLM/pull/5933
[None][chore] Update CI allowlist 2025-08-25 by @yuanjingx87 in https://github.com/NVIDIA/TensorRT-LLM/pull/7229
[None][test] Update qwen3 timeout to 60 minutes by @nvamyt in https://github.com/NVIDIA/TensorRT-LLM/pull/7200
[https://nvbugs/5457504][fix] fix kv cache event test in disaggregated worker tests by @zhengd-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7028
[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server by @zhengd-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6985
[None][doc] Display tech blog for nvidia.github.io domain. by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/7241
[https://nvbugs/5477332][fix] Relax atol in test_mamba2_chunk_scan_combined_prefill_chunking by @amitz-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7215
[None][feat] Hopper Fp8 context mla by @zhou-yuxin in https://github.com/NVIDIA/TensorRT-LLM/pull/7116
[None][infra] Add retry 3 times if ssh cluster failed by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6859
[None][chore] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7251
[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM by @Maurits-de-Groot in https://github.com/NVIDIA/TensorRT-LLM/pull/7260
[None][ci] move qwen3 tests from b200 to gb200 by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7257
[None][perf] Accelerate global scale calculations for deepEP fp4 combine by @yilin-void in https://github.com/NVIDIA/TensorRT-LLM/pull/7126
[None][fix] Fix data type of KV Cache percentage in bench. by @FrankD412 in https://github.com/NVIDIA/TensorRT-LLM/pull/7230
[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7233
[None][update] Update disagg code owners by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7266
[TRTLLM-6633][feat] Padding for piecewise cudagraph by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/6750
[https://nvbugs/5412456][fix] Remove from waives.txt by @zhou-yuxin in https://github.com/NVIDIA/TensorRT-LLM/pull/7248
[None][fix] Remove and fuse some element-wise ops in the ds-r1-fp8 model by @lfr-0531 in https://github.com/NVIDIA/TensorRT-LLM/pull/7238
[None][opt] Balance the request based on number of tokens in AttentionDP by @Shunkangz in https://github.com/NVIDIA/TensorRT-LLM/pull/7183
[TRTLLM-6960][fix] replace flasky scaled_mm test with more stable config by @dc3671 in https://github.com/NVIDIA/TensorRT-LLM/pull/7089
[None][feat] Add logging for OAI disagg server by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7232
[TRTLLM-7457][ci] Update & cleanup unittest parallel config by @tongyuantongyu in https://github.com/NVIDIA/TensorRT-LLM/pull/7254
[None][chore] update disagg readme and scripts for pipeline parallelism by @raayandhar in https://github.com/NVIDIA/TensorRT-LLM/pull/6875
[None][chore] Wrap the swiglu into custom op to avoid redundant device copy. by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/7021
[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge by @dongxuy04 in https://github.com/NVIDIA/TensorRT-LLM/pull/7262
[None][ci] remove test_llm_api_autodeploy from B200 test db by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7282
[https://nvbugs/5453727][fix] Fix bug of how GPT-OSS setup the parameters in CI by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/7151
[None][fix] Update maxnt of llama_v3.2_1b bench by @nvamyt in https://github.com/NVIDIA/TensorRT-LLM/pull/7279
[None][refactor] Move draft token padding out of Drafter by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/7134
[TRTLLM-7250][fix] waive failed cases by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7292
[None][infra] Waive failed tests on main 08/27 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7300
[None][ci] parallelize unit tests of auto deploy in B200 by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7291
[https://nvbugs/5458798][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold by @MrGeva in https://github.com/NVIDIA/TensorRT-LLM/pull/7189
[https://nvbugs/5453727][fix] unwaive qwen3 CI tests by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/7293
[None][fix] Remove the wheel from intermediate docker storage by @MartinMarciniszyn in https://github.com/NVIDIA/TensorRT-LLM/pull/7175
[None] [chore] Make disagg example compatible with recommended usage by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/7121
[TRTLLM-6822][infra] Add PR-Checklist github action and modify PR template by @venkywonka in https://github.com/NVIDIA/TensorRT-LLM/pull/6029
[TRTLLM-7207][feat] Chat completions API for gpt-oss by @LinPoly in https://github.com/NVIDIA/TensorRT-LLM/pull/7261
[None][ci] fix test list name by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7321
[None][fix] Disable mandatory PR checklist enforcement by @venkywonka in https://github.com/NVIDIA/TensorRT-LLM/pull/7325
[https://nvbugs/5430124][ci] Unwaive Mistral 3.1 Small tests by @2ez4bz in https://github.com/NVIDIA/TensorRT-LLM/pull/7274
[None][ci] skip TestGPTOSS by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7333
[TRTLLM-6876][feat] Add low precision all2all for mnnvl by @zongfeijing in https://github.com/NVIDIA/TensorRT-LLM/pull/7155
[None] [feat] Use numa to bind CPU by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/7304
[https://nvbugs/5474453][fix] fix path to tested model by @nzmora-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/7272
[None][doc] add adp balance blog by @yunruis in https://github.com/NVIDIA/TensorRT-LLM/pull/7213
[None][infra] Waive failed tests on main branch 08/26 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7346
[None][fix] mxfp4 padding bug for TRT-LLM and CUTLASS MoE backends by @nekorobov in https://github.com/NVIDIA/TensorRT-LLM/pull/7214
[None][chore] Some improvements for CI stability by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7199
[None][feat] Refactor llama4 for multimodal encoder IFB by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/6844
[https://nvbugs/5445466][fix] Bypass MLP TP split for MNNVL in DeepSeek V3 to avoid hanging. by @timlee0212 in https://github.com/NVIDIA/TensorRT-LLM/pull/6886
[TRTLLM-7457][ci] Update unittest parallel config by @tongyuantongyu in https://github.com/NVIDIA/TensorRT-LLM/pull/7297
[None][perf] Disable Swap AB when num tokens exceeds N dimension by @djns99 in https://github.com/NVIDIA/TensorRT-LLM/pull/7104
[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test by @aalanwyr in https://github.com/NVIDIA/TensorRT-LLM/pull/7284
[None][feat] KV Cache Connector API by @richardhuo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7228
[None] [chore] Update .coderabbit.yaml review configuration by @venkywonka in https://github.com/NVIDIA/TensorRT-LLM/pull/7351
[https://nvbugs/5445466][fix] Eliminate race when loading HF dynamic modules by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7268
[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests by @fredricz-20070104 in https://github.com/NVIDIA/TensorRT-LLM/pull/7326
[None][fix] fix doc formula by @yunruis in https://github.com/NVIDIA/TensorRT-LLM/pull/7367
[https://nvbugs/5481385][fix] Fix max_seq_len in cuda graph warmup and intermediate_size in fused_moe_deepgemm by @lfr-0531 in https://github.com/NVIDIA/TensorRT-LLM/pull/7345
[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss by @pengbowang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7192
[None][infra] Waive failed tests on main branch 08/29 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7370
[None][doc] Exposing the ADP balance strategy tech blog by @juney-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/7380
[None][feat] Update TargetInfo to accommodate CP in disagg by @brb-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7224
[None][docs] Update Dynasor paper info by @AndyDai-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7137
[None] [fix] store blog 10 media via lfs by @Funatiq in https://github.com/NVIDIA/TensorRT-LLM/pull/7375
[TRTLLM-7250][fix] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7342
[None][chore] Bump version to 1.1.0rc3 by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7394
[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local reduction by @zongfeijing in https://github.com/NVIDIA/TensorRT-LLM/pull/7369
[None][feat] Support NVFP4 KV Cache by @Tom-Zheng in https://github.com/NVIDIA/TensorRT-LLM/pull/6244
[None][ci] Some improvements for Slurm CI setup by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7407
[None][chore] Mass integration of release/1.0 - 2nd by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/7171
[None][test] Update case that not support passing quantization fp8 for pytorch backend by @nvamyt in https://github.com/NVIDIA/TensorRT-LLM/pull/7302
[None][infra] Disable GB200-PyTorch-1 due to OOM issue by @yuanjingx87 in https://github.com/NVIDIA/TensorRT-LLM/pull/7386
[https://nvbugs/5481087][fix] fix bug of ci when we use mocker by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/7332
[None][infra] Waive failed case on main 0901 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7447
[TRTLLM-7353][feat] Implement capturable drafting loops for speculation by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/7100
[None] [doc] Update DeepSeek example doc by @jiahanc in https://github.com/NVIDIA/TensorRT-LLM/pull/7358
[None][fix] Fix nanobind failure by @Tom-Zheng in https://github.com/NVIDIA/TensorRT-LLM/pull/7425
[None][chore] Use llm args in create_py_executor by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/7239
[https://nvbugs/5485430][fix] Copy the nanobind file when using precompiled package by @jiaganc in https://github.com/NVIDIA/TensorRT-LLM/pull/7334
[None][infra] Using local variables in rerun function by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7198
[None][ci] Correct docker args for GPU devices and remove some stale CI codes by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7417
[https://nvbugs/5476580][fix] unwaive test_nvfp4_4gpus by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/7454
[None][test] auto reuse torch empty cache on qa test by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/7421
[None][doc] fix example in docstring by @tomeras91 in https://github.com/NVIDIA/TensorRT-LLM/pull/7410
[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test by @aalanwyr in https://github.com/NVIDIA/TensorRT-LLM/pull/7413
[None][infra] waive test case failed on post-merge by @HuiGao-NV in https://github.com/NVIDIA/TensorRT-LLM/pull/7471
[TRTLLM-7208][feat] Implement basic functionalities for Responses API by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7341
[https://nvbugs/5453992][unwaive] Unwaive llama quickstart test by @peaceh-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7242
[None][infra] Waive failed tests on main branch 0902 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7482
[None][chore] Fix formatting error in Gemma3 readme by @karljang in https://github.com/NVIDIA/TensorRT-LLM/pull/7352
[https://nvbugs/5470782][fix] Add specific test names for test_deepseek.py by @SimengLiu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7318
[https://nvbugs/5458798][fix] Disabled test_trtllm_bench_backend_comparison due to timeout by @MrGeva in https://github.com/NVIDIA/TensorRT-LLM/pull/7397
[None][chore] Add note about trtllm-serve to the devel container by @MartinMarciniszyn in https://github.com/NVIDIA/TensorRT-LLM/pull/7483
[None][chore] rm executor config in kv cache connector by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/7372
[None][perf] Add MOE support for dynamic cluster shapes and custom epilogue … by @djns99 in https://github.com/NVIDIA/TensorRT-LLM/pull/6126
[None][perf] Autotune TRT-LLM Gen MoE when using CUDA graphs by @jinyangyuan-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/7285
[TRTLLM-7261][feat] Support phi-4 model in pytorch backend by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/7371
[https://nvbugs/5480289][fix] release slot manager in mtp MTPHiddenStatesManager by @yweng0828 in https://github.com/NVIDIA/TensorRT-LLM/pull/7340
[https://nvbugs/5488141][fix] Unwaive llama3 test_eagle3 by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/7486
[https://nvbugs/5472947][fix] wait on isend handles before reusing buffers by @amukkara in https://github.com/NVIDIA/TensorRT-LLM/pull/7462
[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 by @StanleySun639 in https://github.com/NVIDIA/TensorRT-LLM/pull/7083
[https://nvbugs/5485593][fix] improve accuracy/test_disaggregated_serving.py by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/7366
[None][doc] add GPT OSS Eagle3 blog by @IzzyPutterman in https://github.com/NVIDIA/TensorRT-LLM/pull/7140
[None][fix] Fix KV cache recompute in draft_target spec decode by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/7348
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/6948
[None][chore] Remove two unused parameters in create_py_executor by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/7458
[#7222][autodeploy] Separate run_shape_prop as another graph utility by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7313
[None][fix] Fix a numerical stability issue for XQA with spec dec by @lowsfer in https://github.com/NVIDIA/TensorRT-LLM/pull/7114
[https://nvbugs/5470769][fix] fix disagg-serving accuracy test case by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/7338
[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options by @StanleySun639 in https://github.com/NVIDIA/TensorRT-LLM/pull/7492
[https://nvbugs/5485102][fix] Correctly set stride for piecewise outp… by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7442
[TRTLLM-7442][model] Remove unnecessary D2H copies by @2ez4bz in https://github.com/NVIDIA/TensorRT-LLM/pull/7273
[TRTLLM-6199][infra] Update for using open driver from BSL by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7430
[None][fix] Fix a typo in the Slurm CI codes by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7485
[TRTLLM-6342][fix] Fixed triggering BMM sharding by @greg-kwasniewski1 in https://github.com/NVIDIA/TensorRT-LLM/pull/7389
[None][fix] fix hunyuan_moe init bug by @sorenwu in https://github.com/NVIDIA/TensorRT-LLM/pull/7502
[None][chore] Bump version to 1.1.0rc4 by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7525
[https://nvbugs/5485886][fix] Fix resource free of Eagle3ResourceManager by @kris1025 in https://github.com/NVIDIA/TensorRT-LLM/pull/7437
[TRTLLM-6893][infra] Disable the x86 / SBSA build stage when run BuildDockerImage by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6729
[https://nvbugs/5477730][fix] Fix the alltoall case when tp_size larger than ep_size by @WeiHaocheng in https://github.com/NVIDIA/TensorRT-LLM/pull/7331
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/7521
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS for thop/parallel tests by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7489
[None][test] update nim and full test list by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/7468
[None][feat] MultiLayer Eagle by @IzzyPutterman in https://github.com/NVIDIA/TensorRT-LLM/pull/7234
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7481
[OMNIML-2336][feat] Add NVFP4 x FP8 by @sychen52 in https://github.com/NVIDIA/TensorRT-LLM/pull/6809
[https://nvbugs/5492485][fix] Use offline dataset from llm-models instead. by @yuxianq in https://github.com/NVIDIA/TensorRT-LLM/pull/7435
[TRTLLM-7410][feat] Support hashing and KV cache reuse for videos by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7360
[https://nvbugs/5369366] [fix] Report failing requests by @arekay in https://github.com/NVIDIA/TensorRT-LLM/pull/7060
[None][feat] Add Request specific exception by @Shunkangz in https://github.com/NVIDIA/TensorRT-LLM/pull/6931
[#3325][feat] Add MCTS and TOT tree-based inference controllers to Scaffolding by @therealnaveenkamal in https://github.com/NVIDIA/TensorRT-LLM/pull/7490
[https://nvbugs/5483615][fix] Remove unnecessary assertion to let mai… by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7441
[None][ci] remove unnecessary test_modeling_deepseek.py by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7542
[None][chore] Remove closed bugs by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7408
[TRTLLM-6642][feat] add gptoss 20g tests by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7361
[None][ci] Increase the number of retries in docker image generation by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7557
[None][infra] update nspect version by @niukuo in https://github.com/NVIDIA/TensorRT-LLM/pull/7552
[https://nvbugs/5461761][fix] Remove the waiver by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7476
[#6186][feat] Introduce QKNormRoPEAttention module by @Funatiq in https://github.com/NVIDIA/TensorRT-LLM/pull/6830
[None][chore] Remove executor_config in create_py_executor_instance by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/7463
[None][infra] Waive failed tests on main branch 0905 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7564
[https://nvbugs/5453806][unwaive] Unwaive fp8 kvcache attention test by @peaceh-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7243
[#6120][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example by @lucaslie in https://github.com/NVIDIA/TensorRT-LLM/pull/7221
[None][ci] Revert "[https://nvbugs/5461761][fix] Remove the waiver (#7476)" by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7584
[None][ci] move some test cases of DGX H100 to post merge by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7569
[None][ci] Improve SSH connection stability by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7567
[None][ci] Waive qwen3 test for accuracy bug in https://nvbugs/5505402 by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/7585
[None][fix] DeepSeek-R1 W4A8 weight loading issue; fixes regression from [#6200] by @rosenrodt in https://github.com/NVIDIA/TensorRT-LLM/pull/7123
[None][chore] share input_ids buffers among different cuda graphs by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7236
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7106
[TRTLLM-4629] [feat] Step1: trtllm-gen kernels support sm103 by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/7570
[TRTLLM-7440][fix] Split fused_input_embed to separate out host sync by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7280
[https://nvbugs/5502352][fix] Fix 2-model CDL path by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/7543
[TRTLLM-5950][infra] Removing remaining turtle keywords from the code base by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7086
[https://nvbugs/5448767][fix] sync termination of requests across PP ranks by @raayandhar in https://github.com/NVIDIA/TensorRT-LLM/pull/7455
[None][infra] Skip RTX Pro 6000 test stages due to HW are offline by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7592
[TRTLLM-7153] [feat] Move stop_criteria to sample_async by @netanel-haber in https://github.com/NVIDIA/TensorRT-LLM/pull/7041
[None][ci] Block some nodes to avoid unstable network access by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7593
[None][fix] fixing the math on asymmetric tp+pp tests by @raayandhar in https://github.com/NVIDIA/TensorRT-LLM/pull/7098
[TRTLLM-7187][fix] Build wheel with NIXL by @BatshevaBlack in https://github.com/NVIDIA/TensorRT-LLM/pull/7472
[None][chore] expose tokens_per_block into KvCacheConfig by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/5911
[None][docs] refine docs for accuracy evaluation of gpt-oss models by @binghanc in https://github.com/NVIDIA/TensorRT-LLM/pull/7252
[TRTLLM-7779][feat] Support multiple postprocess workers for chat completions API by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7508
[None][chore] Mass integration of release/1.0 - 3rd by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/7519
[https://nvbugs/5506683][fix] adjust the CI by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/7604
[None][infra] Add back rtx-pro-6000 stages since the node is available by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7601
[None][feat] Update multimodal utility get_num_tokens_per_image for better generalization by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7544
[TRTLLM-6142][feat] Reland: set torch recompile_limit based on cuda_graph_batch_sizes and refactored by @MrGeva in https://github.com/NVIDIA/TensorRT-LLM/pull/7219
[None][chore] remove executor config in instantiate sampler by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/7516
[TRTLLM-7361][feat] KV cache transfer for uneven pp by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7117
[None][infra] Try to fix docker container failed to be killed issue by @yuanjingx87 in https://github.com/NVIDIA/TensorRT-LLM/pull/7388
[None][fix] Add try-catch in stream generator by @zhanghaotong in https://github.com/NVIDIA/TensorRT-LLM/pull/7467
[https://nvbugs/5481080][fix] Fix GPTOSS W4A16 reference by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/7323
[None][test] Skip eagle3 test by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7627
[https://nvbugs/5453709][fix] Remove transformers version limit in Qwen2VL by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/7152
[TRTLLM-5877][infra] Add fmha tests and auto trigger rules by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/6050
[None][chore] Mass integration of release/1.0 - 4th (release/1.0 doc change mainly) by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/7607
[None][feat] Nixl support for GDS by @tshmilnvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/5488
[TRTLLM-4366][infra] Don't call reinstall_rockylinux_cuda when the base CUDA image is up to dated by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/5980
[#6529][feat] CMake option to link statically with cublas/curand by @WilliamTambellini in https://github.com/NVIDIA/TensorRT-LLM/pull/7178
[None][feat] Extend VLM factory and add Mistral3 factory by @2ez4bz in https://github.com/NVIDIA/TensorRT-LLM/pull/7583
[None][fix] add the missing import raised by [#7607] by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/7639
[None][chore] Remove closed bugs by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7591
[https://nvbugs/5454559][fix] handle bias term in fuse_gate_mlp by @Linda-Stadter in https://github.com/NVIDIA/TensorRT-LLM/pull/7449
[None][fix] enable NvFP4/FP8 quantization for Nemotron-H architecture by @tomeras91 in https://github.com/NVIDIA/TensorRT-LLM/pull/7589
[None][feat] Optimize MLA kernels with separate reduction kernels by @PerkzZheng in https://github.com/NVIDIA/TensorRT-LLM/pull/7597
[https://nvbugs/5445466][fix] unwaive DS R1 test cases with bug already fixed by @lancelly in https://github.com/NVIDIA/TensorRT-LLM/pull/7429
[#6798][fix] fix compilation error in ub_allocator in single device build by @WilliamTambellini in https://github.com/NVIDIA/TensorRT-LLM/pull/6874
[https://nvbugs/5434424][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. by @StudyingShao in https://github.com/NVIDIA/TensorRT-LLM/pull/7615
[None][chore] add TorchLlmArgs to the connector api by @richardhuo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7493
[TRTLLM-6707][fix] nanobind fix for executor exit call by @Linda-Stadter in https://github.com/NVIDIA/TensorRT-LLM/pull/7565
[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7629
[TRTLLM-7408][feat] Wrap MOE with custom op. by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7277
[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7349
[None][fix] fix post-merge issue raised by [#5488] by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/7655
[https://nvbugs/5410687][test] Add deepseek r1-w4afp8 quickstart by @fredricz-20070104 in https://github.com/NVIDIA/TensorRT-LLM/pull/7645
[None][fix]UCX zmq ip support ipv6 by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7530
[None][feat] Make the should_use_spec_decode logic a bit smarter by @zheyuf in https://github.com/NVIDIA/TensorRT-LLM/pull/7112
[#5861][autodeploy] Refactor: Quantization Transforms with Inheritance by @Fridah-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7227
[#7208][fix] Fix config type of MedusaConfig by @karljang in https://github.com/NVIDIA/TensorRT-LLM/pull/7320
[None][infra] Bump version to 1.1.0rc5 by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7668
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7503
[https://nvbugs/5494698][fix] skip gemma3 27b on blackwell by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7505
[https://nvbugs/5477359][fix] Nanobind: Allow none types for fields in result by @Linda-Stadter in https://github.com/NVIDIA/TensorRT-LLM/pull/7672
[None][chore] remove executor config in kv cache creator by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/7526
[https://nvbugs/5488212][waive] Waive failed tests for L20 by @nvamyt in https://github.com/NVIDIA/TensorRT-LLM/pull/7664
[None][feat] Use a shell context to install dependancies by @v-shobhit in https://github.com/NVIDIA/TensorRT-LLM/pull/7383
[https://nvbugs/5505402] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues by @DomBrown in https://github.com/NVIDIA/TensorRT-LLM/pull/7616
[None][infra] Waive failed cases on main 0910 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7676
[None][infra] Adjust labeling llm prompt for bug issues by @karljang in https://github.com/NVIDIA/TensorRT-LLM/pull/7385
[None][ci] move some test cases from l40s to a30 by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7684
[None][fix] Fix the incorrect header file import in dataType.h by @Fan-Yunfan in https://github.com/NVIDIA/TensorRT-LLM/pull/7133
[https://nvbugs/5498165][fix] fix permission error for config file lock by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7656
[https://nvbugs/5513192][fix] Add the missing param for kv_cache_tran… by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/7679
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend by @LinPoly in https://github.com/NVIDIA/TensorRT-LLM/pull/6097
[TRTLLM-7169][infra] Fix Slurm multi-node test showing "Submit Test Results" in the test name by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6856
[TRTLLM-6791][infra] Add check for uploading stage name and avoid overriding test result tar file by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/6742
[None][ci] Some improvements for Slurm CI by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7689
[None][ci] Test waives for the main branch 09/14 by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7698
[None][feat] support gpt-oss with fp8 kv cache by @PerkzZheng in https://github.com/NVIDIA/TensorRT-LLM/pull/7612
[TRTLLM-6903][feat] Support chunked prefill for multimodal models by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/6843
[None][chore] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7682
[None][chore] Enable multiple postprocess workers tests for chat completions api by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7602
[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/7365
[https://nvbugs/5467981][fix] Fix Qwen2.5-VL fails with cuda graph padding by @DylanChen-NV in https://github.com/NVIDIA/TensorRT-LLM/pull/7122
[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage by @HuiGao-NV in https://github.com/NVIDIA/TensorRT-LLM/pull/7699
[TRTLLM-7918][feat] Support kvcache reuse for phi4mm by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/7563
[None][test] add test for min_tokens by @ixlmar in https://github.com/NVIDIA/TensorRT-LLM/pull/7678
[TRTLLM-7918][feat] Revert "Support kvcache reuse for phi4mm (#7563)" by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/7722
[None][fix] using arrival time in llmapi when creating LlmRequest in pytorch workflow by @zhengd-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7553
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill by @jmydurant in https://github.com/NVIDIA/TensorRT-LLM/pull/7477
[None][ci] Test waives for the main branch 09/15 by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7709
[None][feat] Eagle, use last hidden post norm by @IzzyPutterman in https://github.com/NVIDIA/TensorRT-LLM/pull/7546
[None][infra] AutoDeploy: codeowners for autodeploy unit tests by @lucaslie in https://github.com/NVIDIA/TensorRT-LLM/pull/7743
[TRTLLM-6668][feat] Enable overlap scheduler for two-model spec decoding by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7651
[None][ci] move qwen3 tests from GB200 to B200 by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7733
[None][feat] support attention dp for qwen3 dense model by @Nekofish-L in https://github.com/NVIDIA/TensorRT-LLM/pull/7618
[None][doc] Fix the link in the doc by @Shixiaowei02 in https://github.com/NVIDIA/TensorRT-LLM/pull/7713
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/7568
[TRTLLM-6295][test] Exit as early as possible and propagate exit status correctly for multi-node testing by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7739
[None][chore] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7735
[None][fix] Ensure that the W4A8 custom input scale remains aligned across all ranks by @yilin-void in https://github.com/NVIDIA/TensorRT-LLM/pull/7614
[None][chore] Fix error when running trtllm-bench without cuda graph. by @bobboli in https://github.com/NVIDIA/TensorRT-LLM/pull/7725
[None][doc] Clean the doc folder and move the outdated docs into lega… by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/7729
[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op by @limin2021 in https://github.com/NVIDIA/TensorRT-LLM/pull/7632
[None] [chore] cherry pick changes on slurm scripts from release/1.1.0rc2 by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/7750
[https://nvbugs/5503529][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF by @amitz-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7740
[TRTLLM-7070][feat] add gpt-oss serve benchmark tests by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7638
[None][fix] waive hang tests on main by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7720
[https://nvbugs/5471106][fix] Remove the waivers by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7711
[None][chore] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7746
Revert "[None][feat] support attention dp for qwen3 dense model" by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/7765
[TRTLLM-8044][refactor] Rename data -> cache for cacheTransceiver by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7659
[None][chore] AutoDeploy: neat disablement of transforms in pipeline by @lucaslie in https://github.com/NVIDIA/TensorRT-LLM/pull/7736
[None][chore] Remove unused get_quant_scales methods by @achartier in https://github.com/NVIDIA/TensorRT-LLM/pull/7687
[None][infra] add nspect allow list for false positive secrets by @yuanjingx87 in https://github.com/NVIDIA/TensorRT-LLM/pull/5797
[TRTLLM-7398][doc] Add doc for KV cache salting support by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7772
[None][infra] Update CI allowlist 2025-09-16 by @yuanjingx87 in https://github.com/NVIDIA/TensorRT-LLM/pull/7773
[None][infra] Add nightly pipeline to generate lock files by @yuanjingx87 in https://github.com/NVIDIA/TensorRT-LLM/pull/5798
[https://nvbugs/5516666][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding by @HuiGao-NV in https://github.com/NVIDIA/TensorRT-LLM/pull/7737
[None][waive] Waive tests by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7775
[https://nvbugs/5489015][fix] Support communicator split in MNNVL allreduce and fix the binding issues. by @timlee0212 in https://github.com/NVIDIA/TensorRT-LLM/pull/7387
[https://nvbugs/5488582][fix] Cherry-pick 7495: Avoid unexpected Triton recompilation in DG fused_moe by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/7708
[TRTLLM-6741] [feat] enable LM tp for MTP, under attention dp case (cherry-pick [#7128]) by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/7571
[None][chore] AutoDeploy: clean up of model unit test configuration by @lucaslie in https://github.com/NVIDIA/TensorRT-LLM/pull/7742
[None][ci] waive test_llm_gemma_1gpu_summary_vswa by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7781
[https://nvbugs/5517260][fix] move scaffolding contrib module's import to subdirectory by @dc3671 in https://github.com/NVIDIA/TensorRT-LLM/pull/7758
[None][feat] add an example of KV cache host offloading by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7767
[https://nvbugs/5485325][fix] Cherry-pick [#7373]: fix the CUDA graph warmup issue when using speculative decoding by @lfr-0531 in https://github.com/NVIDIA/TensorRT-LLM/pull/7734
[None][ci] waive test_llama_eagle3[True-FLASHINFER-False-False-False-False-True] by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7788
[None][chore] Remove closed bugs by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7697
[None][test] add gpt oss model for trtllm perf test by @ruodil in https://github.com/NVIDIA/TensorRT-LLM/pull/7328
[TRTLLM-7250][fix] waive block tests by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7782
[None][doc] fix section header of llm_kv_cache_offloading example by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7795
[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 by @2ez4bz in https://github.com/NVIDIA/TensorRT-LLM/pull/7628
[None][infra] Waive failed tests on main 09/17 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7812
[None][doc] Update Documentation link to point to docs instead of docs source code by @asrivas in https://github.com/NVIDIA/TensorRT-LLM/pull/6495
[TRTLLM-5966][feat] Helix: make softmax stats pointer available to attention gen by @MatthiasKohl in https://github.com/NVIDIA/TensorRT-LLM/pull/6865
[https://nvbugs/5516661][fix] Drop waive case 5516661 by @yunruis in https://github.com/NVIDIA/TensorRT-LLM/pull/7791
[https://nvbugs/5508536][fix] Revert [#7041]: Move stop_criteria to sample_async (#7041) by @netanel-haber in https://github.com/NVIDIA/TensorRT-LLM/pull/7796
[#7308] [feat] AutoDeploy: graph-less transformers mode for HF by @lucaslie in https://github.com/NVIDIA/TensorRT-LLM/pull/7635
[None][ci] restore unwaive list by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/7802
[None][fix] Make tile_tokens_dim calculation just in time before kernel launching. by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/7529
[None][chore] Version bump for 1.1.0rc6 by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7824
[https://nvbugs/5519544][fix] fix invalid expression for disabling pa… by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/7806
[TRTLLM-8070][test] add generation logits case for llama3 by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/7759
[https://nvbugs/5523080][fix] Correct the batch index in device tensors by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7803
[None][feat] Cherry-pick DeepGEMM related commits from release/1.1.0rc2 by @Barry-Delaney in https://github.com/NVIDIA/TensorRT-LLM/pull/7716
[None][fix] Fix CI issue for dsl pkg install by @limin2021 in https://github.com/NVIDIA/TensorRT-LLM/pull/7784
[https://nvbugs/5508890][fix] gen. result cleanup when using PostprocWorker by @ixlmar in https://github.com/NVIDIA/TensorRT-LLM/pull/7771
[None][infra] update ci allow list 2025/09/17 by @yuanjingx87 in https://github.com/NVIDIA/TensorRT-LLM/pull/7816
[None][chore] Remove executor config in create_py_executor by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/7599
[TRTLLM-7250][fix] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7801
[https://nvbugs/5519530][fix] Fix gptoss 2-gpu test by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/7819
[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/7207
[None][fix] Add TP information in weight scale loading in WeightOnlyQuantLinearMethod by @stnie in https://github.com/NVIDIA/TensorRT-LLM/pull/7732
[TRTLLM-7250][fix] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7807
[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/7723
[https://nvbugs/5519462][fix] skip deepseek test on preHopper by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7817
[None][chore] remove generated fmha_cubin.h from source tree by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7836
[None][fix] Revert "Revert "[None][feat] support attention dp for qwen3 dense model"" by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/7780
[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm by @limin2021 in https://github.com/NVIDIA/TensorRT-LLM/pull/7764
[None][doc] Cherry-pick deployment guide update from 1.1.0rc2 branch to main branch by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/7774
[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle by @sunnyqgg in https://github.com/NVIDIA/TensorRT-LLM/pull/7001
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7800
[https://nvbugs/5522851][fix] Correct the logic to update kv_lens_cuda by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7790
[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) by @yuxianq in https://github.com/NVIDIA/TensorRT-LLM/pull/7610
[TRTLLM-6286] [feat] Update CUTLASS to 4.2 and enable SM103 group gemm by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/7832
[None][fix] get Local IP by connect remote by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7719
[TRTLLM-7183][test] Feature fix model issue for disagg serving by @fredricz-20070104 in https://github.com/NVIDIA/TensorRT-LLM/pull/7785
[https://nvbugs/5481434][feat] cherry-pick fix to reuse pytorch memory segments occupied by cudagraph by @HuiGao-NV in https://github.com/NVIDIA/TensorRT-LLM/pull/7747
[None][test] add deepseek r1/v3 model with chunked prefill cases by @ruodil in https://github.com/NVIDIA/TensorRT-LLM/pull/7124
[None][chore] polish error message in cute_dsl_utils.py by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7852
[None][fix] fix load_model_on_cpu on qwen/convert_checkpoint.py by @lkm2835 in https://github.com/NVIDIA/TensorRT-LLM/pull/2382
[None][infra] Waive failed tests in post-merge by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7859
[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956. by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/7853
[None][test] Add accuracy benchmark in stress test by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/7561
[None][chore] remove cli cases for rtx6k by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/7833
[None][feat] Support EPLB in Qwen3 MoE by @lucifer1004 in https://github.com/NVIDIA/TensorRT-LLM/pull/7443
[None][chore] Add failed cases into waives.txt by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7841
[https://nvbugs/5503440][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue by @lancelly in https://github.com/NVIDIA/TensorRT-LLM/pull/7646
[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7864
[https://nvbugs/5522332][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783) by @yuxianq in https://github.com/NVIDIA/TensorRT-LLM/pull/7797
[TRTLLM-5966][feat] Helix: add custom position ids to MLA kernels by @MatthiasKohl in https://github.com/NVIDIA/TensorRT-LLM/pull/6904
[https://nvbugs/5471108][chore] Unwaiving disagg acc test by @pcastonguay in https://github.com/NVIDIA/TensorRT-LLM/pull/7686
[https://nvbugs/5522462][fix] Fix FP8 scout illegal memory access by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/7845
[#7704][chore] Enable MathJax to fix formulas in documentation by @karljang in https://github.com/NVIDIA/TensorRT-LLM/pull/7744
[TRTLLM-6342][feat] Support for partial sharding from factory by @greg-kwasniewski1 in https://github.com/NVIDIA/TensorRT-LLM/pull/7393
[https://nvbugs/5520490][fix] Fix intermittent test failures by avoiding external web data pulls by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7879
[None][doc] Update tech blog12 by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7884
[TRTLLM-7731][feat] KV cache transmission in disagg with CP on gen side by @brb-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7624
[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBase for better code reusing by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/7840
[https://nvbugs/5517404][fix] Use the correct cuda graph for dynamic spec dec by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7728
[TRTLLM-6286] [perf] Add NoSmem epilogue schedule and dynamic cluster shape for sm10x group gemm by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/7757
[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist by @dongxuy04 in https://github.com/NVIDIA/TensorRT-LLM/pull/7727
[None][fix] Disable torch.compile for CapturableGuidedDecoder by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7871
[None][fix] cherrypick to main: Fix possible mpi broadcast and gather issue on large object by @dongxuy04 in https://github.com/NVIDIA/TensorRT-LLM/pull/7854
[https://nvbugs/5512556][unwaive] Unwaive DeepSeek PP tests by @peaceh-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7828
[https://nvbugs/5513423][fix] Correctly respect min_tokens in PyTorch Workflow by @stnie in https://github.com/NVIDIA/TensorRT-LLM/pull/7808
[None][fix] Fix DeepGEMM commit by @Barry-Delaney in https://github.com/NVIDIA/TensorRT-LLM/pull/7875
[None][chore] Mass integration of release/1.0 - 5th by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/7640
[TRTLLM-7070][feat] add gpt-oss chunked prefill tests by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7779
[None][infra] Waive a failed case on main by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7901
[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7766
[https://nvbugs/5525849][fix] Cherry-pick to fix mismatch of max seq len between kv cache manager and dummy requests by @HuiGao-NV in https://github.com/NVIDIA/TensorRT-LLM/pull/7855
[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance by @yechank-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/7250
[None][infra] Skip failed test for nvbugs 5532023 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7905
[https://nvbugs/5351244][fix] CHERRY-PICK test_mpi_session (#7501) by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/7900
[None][chore] Upgrade transformers to 4.56.0 by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/7523
[https://nvbugs/5477359][fix] Removing test waivers by @Linda-Stadter in https://github.com/NVIDIA/TensorRT-LLM/pull/7877
[https://nvbugs/5516665][fix] Fix CUTLASS moe fake impl errors by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7714
[None] [feat] Enable run_post_quant_allgather for MoE TRTLLM backend by @ChristinaZ in https://github.com/NVIDIA/TensorRT-LLM/pull/6794
[https://nvbugs/5504086][fix] Fix MTP vanilla by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7904
[TRTLLM-7831][feat] Cherry-pick from [#7423] Support fp8 block wide ep cherry pick by @xxi-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7712
[TRTLLM-8209][feat] Support new structural tag API (upgrade XGrammar to 0.1.25) by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7893
[https://nvbugs/5522847][fix] Disable GC on disagg server and client by @yuantailing in https://github.com/NVIDIA/TensorRT-LLM/pull/7858
[None][feat] Add Tencent HunYuanDenseV1 model support by @sorenwu in https://github.com/NVIDIA/TensorRT-LLM/pull/7081
[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/7577
[None][opt] Add batch waiting when scheduling by @yunruis in https://github.com/NVIDIA/TensorRT-LLM/pull/7416
[https://nvbugs/5355128][fix] Add missing wgmma intrinsic for starcoder by @pengbowang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7643
[None][fix] Read eos_token_id from generation_config for kimi_k2 by @pengbowang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7120
[None][fix] Fix and add test for TRTLLM MoE backend by @pengbowang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7755
[None][test] rename llm_perf_full to llm_perf_core and add missing cases by @ruodil in https://github.com/NVIDIA/TensorRT-LLM/pull/7899
[None][fix] CHERRY-PICK trtllm-serve yaml loading (#7551) by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/7897
[https://nvbugs/5367180][fix] Fix xgrammar import before loading tensorrt_llm binary by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7906
[None][fix] fix a bug with trtllm-gen kernels + attention sinks by @PerkzZheng in https://github.com/NVIDIA/TensorRT-LLM/pull/7919
[https://nvbugs/5532023][fix] executor with-statement bug by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/7895
[None][fix] Re-add the import for allgather that was mistakenly removed. by @ChristinaZ in https://github.com/NVIDIA/TensorRT-LLM/pull/7920
[None][chore] Update benchmark script by @zerollzeng in https://github.com/NVIDIA/TensorRT-LLM/pull/7860
[None][fix] Assign [] to req.py_draft_tokens instead of None when spec decode is off by @zheyuf in https://github.com/NVIDIA/TensorRT-LLM/pull/7511
[None][test] Waive another intermittent OOM test by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7930
[None][feat] Use list instead of torch tensor for new tokens in update requests by @dcampora in https://github.com/NVIDIA/TensorRT-LLM/pull/7730
[None][feat] Enable gpt oss on DGX H100. by @Tracin in https://github.com/NVIDIA/TensorRT-LLM/pull/6775
[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (cherry-pick) by @nv-yilinf in https://github.com/NVIDIA/TensorRT-LLM/pull/7776
[TRTLLM-6549][fix] add kv cache time output back by @zhengd-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7798
[None][feat] support JIT mha.cu for SPEC_DEC in runtime by @jhaotingc in https://github.com/NVIDIA/TensorRT-LLM/pull/6078
[TRTLLM-7728][feat] batched sampling by strategy (supersedes enable_mixed_sampler, cf. TRTLLM-7156) by @ixlmar in https://github.com/NVIDIA/TensorRT-LLM/pull/7294
[TRTLLM-7182][test] add multi-nodes test for disagg-serving by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/7470
[TRTLLM-7015] [feat] Enable prompt_logprobs in pytorch backend by @venkywonka in https://github.com/NVIDIA/TensorRT-LLM/pull/7580
[https://nvbugs/5528405][fix] Set up draft_tokens before scheduling by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7903
[https://nvbugs/5477404][chore] unwaive test_disaggregated_single_gpu.py::test_disaggregated_llama_context_capacity by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/7857
[None][fix] refine backend option handling for commands by @tongyuantongyu in https://github.com/NVIDIA/TensorRT-LLM/pull/7829
[#7692][fix] recognize RequestError as per-request error in background handler by @tongyuantongyu in https://github.com/NVIDIA/TensorRT-LLM/pull/7726
[None][chore] Make sampler type beta. by @dcampora in https://github.com/NVIDIA/TensorRT-LLM/pull/7934
[TRTLLM-6341][feature] Support SWA KV cache by @eopXD in https://github.com/NVIDIA/TensorRT-LLM/pull/6768
[https://nvbugs/5532225] [fix] MoE use stream-dependent workspace by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/7940
[None][infra] Skip failed test for nvbugs 5537738 by @pengbowang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7946
[None][chore] remove cubins for ci cases by @qsang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7902
[None][chore] update chunked prefill cases by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7921
[None][feat] Return topk logprobs in torch backend by @dcaox in https://github.com/NVIDIA/TensorRT-LLM/pull/7756
[None][ci] optimize test cases of dgx b200 by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7948
[None][chore] Recover cutlass-dsl pkg install and dsl op testing. by @limin2021 in https://github.com/NVIDIA/TensorRT-LLM/pull/7945
[https://nvbugs/5521799][fix] Trim incorrectly generated harmony messages by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7849
[https://nvbugs/5532248][fix] Fix fused_moe OOM by @HuiGao-NV in https://github.com/NVIDIA/TensorRT-LLM/pull/7931
[None][test] Update llm_models_root to improve path handling on BareMetal environment by @yufeiwu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7876
[None][ci] remove duplicate test cases by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7956
[None][chore] add test_w4_1gpu[True-True-cutlass-fp8] & TestKimiK2::test_fp8_blocks… by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7944
[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7925
[None][feat] add model seed-oss by @Nekofish-L in https://github.com/NVIDIA/TensorRT-LLM/pull/7496
[None][ci] Waive some intermittent failures by @HuiGao-NV in https://github.com/NVIDIA/TensorRT-LLM/pull/7955
[None][fix] trtllm-gen cubins compiled with wrong arch. by @PerkzZheng in https://github.com/NVIDIA/TensorRT-LLM/pull/7953
[None][chore] cleanup build script by @tongyuantongyu in https://github.com/NVIDIA/TensorRT-LLM/pull/7865
[#7675][feat] CapturedGraph to support max_batch_size > max(cuda_graph_batch_sizes) by @MrGeva in https://github.com/NVIDIA/TensorRT-LLM/pull/7888
[None][fix] fix get_iteration_stats IndexError by @macrocell in https://github.com/NVIDIA/TensorRT-LLM/pull/7216
[None][fix] Fix dummy load format for DeepSeek. by @yuxianq in https://github.com/NVIDIA/TensorRT-LLM/pull/7874
[TRTLLM-7399][test] Add DS-R1/Qwen3 test cases for RTX 6000 by @pamelap-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/7662
[https://nvbugs/5473781][fix] Fix llama 4 FP8 for PP>1 by @mikeiovine in https://github.com/NVIDIA/TensorRT-LLM/pull/7220
[None][bug] Fix transformers version for Triton backend by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7964
[OMNIML-2336][feat] Add NVFP4 x FP8 moe kernels by @sychen52 in https://github.com/NVIDIA/TensorRT-LLM/pull/7821
[None][fix] Revert "[None][feat] Return topk logprobs in torch backend (#7756)" by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7969
[None][chore] Validate features combination by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/7630
[https://nvbugs/5456485][bug] unwaive triton test by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/7966
[None][feat] DeepEP LL fp8 dispatch/combine by @yilin-void in https://github.com/NVIDIA/TensorRT-LLM/pull/7927
[None][chore] Update trtllm-bench documentation on setting FP8 KV cache by @achartier in https://github.com/NVIDIA/TensorRT-LLM/pull/7885
[None][chroe] Update the cuda and tensorrt version in homepage icons. by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/7963
[TRTLLM-6541][test] Add NIM perf test cases by @fredricz-20070104 in https://github.com/NVIDIA/TensorRT-LLM/pull/7924
[None][doc] scaffolding tech blog part one by @WeiHaocheng in https://github.com/NVIDIA/TensorRT-LLM/pull/7835
[TRTLLM-7758][feat] Optimize phi4-mm image modality inference by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/7918
[None][infra] Unwaive some tests since dev already have a PR to collect more info by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/7984
[None][perf] Fix the tactic sorting in TrtllmGenBatchedGemmRunner::getValidConfigIndices by @jinyangyuan-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/7419
[https://nvbugs/5536141][fix] fix_disagg_single_gpu_test by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7990
[https://nvbugs/4955671][fix] update test list by @xinhe-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7980
[None][chore] Mass integration of release/1.0 - 6th by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/7928
[None][chore] Remove developer name in comment by @eopXD in https://github.com/NVIDIA/TensorRT-LLM/pull/7981
[None][chore] relax version constraints on fastapi by @PeganovAnton in https://github.com/NVIDIA/TensorRT-LLM/pull/7935
[TRTLLM-5966][feat] Helix: add alltoall op by @MatthiasKohl in https://github.com/NVIDIA/TensorRT-LLM/pull/6815
[None][fix] fix a bug in wideEp use DeepEP with num_chunks > 1 by @xxi-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7954
[None][doc] Add acknowledgements in scaffolding tech blog by @WeiHaocheng in https://github.com/NVIDIA/TensorRT-LLM/pull/7983
[None][infra] Waive failed tests on main 09/25 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8001
[TRTLLM-8533][chore] extract weights loading related logic to model loader by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/7579
[https://nvbugs/5525951][fix] Clarify that PP is not supported for GPTOSS by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/7911
[None][chore] Some clean-ups for CUDA 13.0 dependencies by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/7979
[TRTLLM-7999][infra] Add B300/GB300 single gpu test by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7951
[None][infra] Improve the failure message for accuracy test suite by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/7994
[#6102][fix] support non-system python installation by @tongyuantongyu in https://github.com/NVIDIA/TensorRT-LLM/pull/7763
[None][ci] Waive test_mm_encoder_standalone.py::test_multi_request_batch_chat[llava-v1.6-mistral-7b-hf] by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/8010
[None][feat] Optimize kv cache transfer TEP by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7613
[TRTLLM-7330][feat] Eagle3 cuda graph support for the first draft model inference by @sunnyqgg in https://github.com/NVIDIA/TensorRT-LLM/pull/7363
[None][chore] Bump version to 1.1.0 by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/7942
[None][doc] Refine perf overview.md and correct the error link in per… by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/8036
[None][fix] Fix chunked prefill state of draft request by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/8067
[https://nvbugs/5548098][fix] Fix flakey unit test for dynamic spec decode by @zheyuf in https://github.com/NVIDIA/TensorRT-LLM/pull/8078
[None][ci] Waive failing tests on release/1.1 by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8088
[https://nvbugs/5451280][fix] Reduce memory fraction problem by warmu… by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/7999
[https://nvbugs/5541494] [fix] Fix missing sm100f/103a kernels and add tests by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/8098
[https://nvbugs/5550283][fix] update to the latest MoE API by @xxi-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8169
[https://nvbugs/5536131][fix] Fix illegal access issue when scale is not provided in Llama3/4. by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/7960
[None][chore] Waive tests failing on release/1.1 post merge by @brb-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8185
[https://nvbugs/5550283][fix] update test case to call post quantization explicitly due to code refactor by @xxi-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8188
[None][fix] cherry-pick !8217 pin flashinfer-python version (#8217) by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8252
[https://nvbugs/5538098][fix] Checking connection to etcd server in unit test by @pcastonguay in https://github.com/NVIDIA/TensorRT-LLM/pull/8269
[https://nvbugs/5532023][fix] unwaive GenerationExecutor tests by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/8251
[https://nvbugs/5565590][fix] test_request_perf_metrics_draft by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/8257
[None][infra] Remove WAR code for GH200 node by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8267
[None][infra] Update and waive failed tests for release branch by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8291
[None][chore] Waive test failing on pre-merge CI by @brb-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8295
[None][chore] Update constaint for release by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/8211
[TRTLLM-8246][test] add multimodal kvcache+chunked_prefil cases in to QA test list by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/8212
[https://nvbugs/5522746][fix] unwaive tests caused by node issues after rebooting by @lancelly in https://github.com/NVIDIA/TensorRT-LLM/pull/8268
[None][chore] Update test configs for release by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/8224
[https://nvbugs/5547434][fix] Fix Qwen2.5-VL device_path error by @yechank-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/8057
[https://nvbugs/5550722][fix] Fix image load by @yechank-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/8093
[https://nvbugs/5532789] [doc] Add documents about CUDA 12.9 by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/8192
[https://nvbugs/5546202][fix] Fix concurrent bug for NIXL cache transceiver by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/8147
[https://nvbugs/5563653][infra] reduce docker image layers by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8250
[https://nvbugs/5568951][fix] Fix guided decoding disagg tests by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/8311
[https://nvbugs/5534837][fix] Fix KV cache split on long context by @pcastonguay in https://github.com/NVIDIA/TensorRT-LLM/pull/8247
[https://nvbugs/5565530][fix] Unwaive test by @2ez4bz in https://github.com/NVIDIA/TensorRT-LLM/pull/8273
[https://nvbugs/5470769][chore] unwaive test for PR7338 by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/8258
[None][infra] cherry pick numexpr fix to release/1.1 by @yuanjingx87 in https://github.com/NVIDIA/TensorRT-LLM/pull/8333
[https://nvbugs/5543770][fix] Update to Cutlass v4.2.1 by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8055
[https://nvbugs/5465642][fix] Increase server timeout to wait weight loading by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/8297
[https://nvbugs/5550671][fix] fix disagg-serving multinodes test failure by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/8307
[https://nvbugs/5537878][fix] Reserve an extra slot for padded batch … by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8231
[https://nvbugs/5574556][fix] fix bug of Qwen3_235B_A22B::test_fp8 CI by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/8351
[https://nvbugs/5565541][fix] Add timeout threshold for H100 FHMA test by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/8354
[https://nvbugs/5537348][fix] Use device tensor index for MTP by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8062
[https://nvbugs/5565565] [fix] fp8 wideep support sm103 by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/8228
[TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled by @StanleySun639 in https://github.com/NVIDIA/TensorRT-LLM/pull/8357
[None][infra] Waive failed tests in release post-merge 10/15 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8386
[None][chore] Update nim test list by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/8356
[https://nvbugs/5521949][fix] Update FP8 model with BF16 LoRA test, fix test_bielik_11b_v2_2_instruct_multi_lora by @amitz-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8324
[https://nvbugs/5510879][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 by @amitz-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8313
[https://nvbugs/5552889][fix] fix: Prevent empty batch when using attention DP with disagg by @pcastonguay in https://github.com/NVIDIA/TensorRT-LLM/pull/8372
[https://nvbugs/5545522][fix] move PREEXIT in UB kernels to fix accuracy issue by @dc3671 in https://github.com/NVIDIA/TensorRT-LLM/pull/8318
[https://nvbugs/5534705][fix] Skip unnecessary CUDA graph capture (#8… by @ziyixiong-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8344
[TRTLLM-8129][feat] Allreduce tuning and benchmark script revising by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/7870
[None][test] cherry-pick: add test-model-suites in integration conftest.py by @ruodil in https://github.com/NVIDIA/TensorRT-LLM/pull/8388
[None][bug] Set NCCL_GRAPH_REGISTER to false to avoid hang by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/8409
[https://nvbugs/5437384][test] fix trtllm-llmapi-launch multi tests with single launch by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/8397
[None][chore] Remove duplicate log outputs in test_perf.py by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/8418
[https://nvbugs/5565565] [fix] Remove waiver by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/8450
[https://nvbugs/5524714][fix] Fix TP sharding of fused-QKV weight scales in W4A16 AWQ by @danielafrimi in https://github.com/NVIDIA/TensorRT-LLM/pull/8432
[TRTLLM-8580][test] save runtime report periodically (#8312) by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/8455
[https://nvbugs/5516666][fix] cherry-pick PR 8130 to unwaive the Qwen3 CI by @byshiue in https://github.com/NVIDIA/TensorRT-LLM/pull/8444
[https://nvbugs/5501820][fix] Add requirements for numba-cuda version to WAR mem corruption (#7992) by @pengbowang-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8414
[https://nvbugs/5569081][fix] Upgrade fmha_v2. (cherry-pick from https://github.com/NVIDIA/TensorRT-LLM/pull/8364) by @yuxianq in https://github.com/NVIDIA/TensorRT-LLM/pull/8499
[None][infra] Waive tests for release 1021 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8522
[TRTLLM-8650][fix] beam search request validation by @ixlmar in https://github.com/NVIDIA/TensorRT-LLM/pull/8433
[https://nvbugs/5569713][fix] Disable fp8 deep gemm for EXAONE-4.0-32B-FP8 by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8429
[https://nvbugs/5515753][ci] Add NCCL_DEBUG=INFO flag to collect more… by @SimengLiu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8440
[https://nvbugs/5504095][fix] Unwaive test_user_specify_workspace case. by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/8316
[https://nvbugs/5546510][fix] Move torch.cuda.Stream out of torch com… by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8494
[https://nvbugs/5565549][fix] unwaive test_disaggregated_spec_dec_bat… by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8500
[None][infra] Waive failed tests for release 10/22 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8574
[https://nvbugs/5575829][fix] Unwaive gpt-oss test by @LinPoly in https://github.com/NVIDIA/TensorRT-LLM/pull/8576
[https://nvbugs/5488576][fix] Propagate disable_finalize_fusion config flag in WIDEEP MoE backend (cherry-pick [#8141]) by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/8566
[https://nvbugs/5569754][fix] trtllm llmapi launch port conflict by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/8582
[https://nvbugs/5582277][fix] rework DisaggPPTerminationHandler to fix hang issue by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/8519
[https://nvbugs/5568961][fix] Fix a merge conflict (cherrypick from PR 8365) by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/8553
[https://nvbugs/5549081][fix] Fix device id assignment for some visio… by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/8552
[TRTLLM-8785][fix] create output_dir before test begin (cherry-pick [#8518]) by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/8575
[None][infra] Disable rtxpro6000 stages due to nodes will be offline temporarily by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8616
[https://nvbugs/5575902][fix] set max_batch_size=1 to stabilize accuracy test result by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/8609
[https://nvbugs/5576192][fix] Unwaive the test for test_weight_only_quant_gemm. by @zheyuf in https://github.com/NVIDIA/TensorRT-LLM/pull/8546
[https://nvbugs/5608461][fix] exclude InductorSubproc from thread leak check by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/8624
[https://nvbugs/5541145][fix] Remove DeepSeekR1 test case from H20 to prevent OOM by @jieli-matrix in https://github.com/NVIDIA/TensorRT-LLM/pull/8610
[None][chore] Disable GB300 stages in release branch due to nodes will be offline temporarily by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/8645
[https://nvbugs/5587456][fix] Remove multimodal test cases using TRT backend by @jieli-matrix in https://github.com/NVIDIA/TensorRT-LLM/pull/8611
[None][test] Clean cache for certain easily hang cases by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/8619
[None][infra] Waive failed tests for release 10/24 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8656
[None][docs] Update Python wheel's short-/long-descriptions by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/8485
[https://nvbugs/5597647][fix] Fix MNNVL Allreduce accuracy issue on Hopper by @timlee0212 in https://github.com/NVIDIA/TensorRT-LLM/pull/8612
[https://nvbugs/5608489][fix] Fix output unpack issues for Llama3/4 NVFP4 models. by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/8679
[https://nvbugs/5572320][fix] Ported test_ad_trtllm_bench.py from main by @MrGeva in https://github.com/NVIDIA/TensorRT-LLM/pull/8671
[https://nvbugs/5564465][fix] Overwrite only if default_max_tokens is legal by @LinPoly in https://github.com/NVIDIA/TensorRT-LLM/pull/8538
[https://nvbugs/5578175][fix] Fix block range index by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/8470
[https://nvbugs/5601203] [fix]Restrict fp8 blockscale moe case by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/8583
[None][fix] add readme copy to wheel stage to avoid setup.py failure (cherry-pick [#8736]) by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8754
[https://nvbugs/5556020][fix] cherry-pick fix test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch by @sunnyqgg in https://github.com/NVIDIA/TensorRT-LLM/pull/8644
[https://nvbugs/5580099][fix] Separate cuda graph workspace to prevent IMA by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8685
[None][infra] Waive failed tests for release branch 10/29 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8760
[https://nvbugs/5422621][fix] fix EPLB init hang (cherry-pick [#8649]) by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/8727
[https://nvbugs/5569534][fix] Warm up with different sizes for more s… by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8515
[https://nvbugs/5575841] [test] Move test_moe.py to serial tests to improve stability + unwaive FP4 MoE torch unit tests by @DomBrown in https://github.com/NVIDIA/TensorRT-LLM/pull/8422
[TRTLLM-8971][infra] Cherry-pick for Update gpu key for B300/GB300 (#8724) by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8796
[https://nvbugs/5488118][fix] Unwaive passed tests by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8758
[https://nvbugs/5623960][fix] Compress the warning log of AutoTuner when encountering tactic failures. by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/8795
[None][infra] Skip failed tests for release branch by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8833
[None][infra] Remove invaild waived tests which not in release branch by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8841
[https://nvbugs/5606166][fix] AutoDeploy: use tuples for cudagraph shape lookup by @lucaslie in https://github.com/NVIDIA/TensorRT-LLM/pull/8772
[https://nvbugs/5325296][fix] Enable relaxed acceptance test on Blackwell by @Barry-Delaney in https://github.com/NVIDIA/TensorRT-LLM/pull/8709
[None][fix] WAR for tensorrt depending on the archived nvidia-cuda-runtime-cu13 package by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/8858
[https://nvbugs/5474119][fix] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/8809 by @dongfengy in https://github.com/NVIDIA/TensorRT-LLM/pull/8847
[https://nvbugs/5444687][fix] Cherrypick online EPLB CI fix from main to release 1.1 by @dongxuy04 in https://github.com/NVIDIA/TensorRT-LLM/pull/8854
[TRTLLM-8658][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8621
[https://nvbugs/5606266][fix] Unwaive some test by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/8867
[https://nvbugs/5606268][fix] Fix program exit segment fault triggered CublasMMWarpper deconstructor by @yunruis in https://github.com/NVIDIA/TensorRT-LLM/pull/8834
[https://nvbugs/5608930][fix] Unwaive test 5608930 by @sunnyqgg in https://github.com/NVIDIA/TensorRT-LLM/pull/8831
[https://nvbugs/5461796][fix] Unwaive test test_llmapi_speculative_decoding_mtp by @sunnyqgg in https://github.com/NVIDIA/TensorRT-LLM/pull/8832
[None][infra] Modify wheel path from cuda13/ to dlfw/ by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8868
[None][infra] Waive failed tests for release branch on 11/03 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8879
[https://nvbugs/5521253][fix] Enable Gemma3 12B & 27B on SM100 by @brb-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8666
[None][chore] Update test list by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/8835
[https://nvbugs/5596343] [test] Update accuracy baseline for GPT-OSS-20B by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/8842
[https://nvbugs/5451272][fix] unwaive the test by @Shixiaowei02 in https://github.com/NVIDIA/TensorRT-LLM/pull/8608
[https://nvbugs/5606266][test] move qwen3 multi-node test to the qa list by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/8908
[https://nvbugs/5569754][chore] Adjust max batch size to prevent OOM by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8876
[https://nvbugs/5606136][fix] Fix torch.onnx.export with pytorch upgrade to fallback to dynamo=False. by @SimengLiu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8917
[https://nvbugs/5601682][fix] unwaive test_disaggregated_deepseek_v3_… by @bo-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8888
[https://nvbugs/5634220][fix] Add developer guide back and fix some i… by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/8911
[TRTLLM-8813][infra] Reduce GB200 multi-node test stages for release by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8860
[https://nvbugs/5608930][fix] Wavie TestQwen3_8B::test_chunked_prefill for bug 5608930 by @sunnyqgg in https://github.com/NVIDIA/TensorRT-LLM/pull/8940
[https://nvbugs/5467531][fix] Fix moe test and wide ep fake impl by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8883
[https://nvbugs/5630700][chore] Unwaive Qwen3_235B_A22B test by @shuyixiong in https://github.com/NVIDIA/TensorRT-LLM/pull/8901
[https://nvbugs/5570599][fix] Set KVCache free_gpu_memory_fraction fo… by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/8780
[https://nvbugs/5597647][fix] Fix MNNVL unit test failed due to accuracy issue on Hopper by @timlee0212 in https://github.com/NVIDIA/TensorRT-LLM/pull/8891
[https://nvbugs/5642736][fix] fix AutoDeploy pattern matcher for torch 2.9 (#8920) by @lucaslie in https://github.com/NVIDIA/TensorRT-LLM/pull/8958
[None][infra] Waive failed tests for release branch 11/06 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8966
[https://nvbugs/5636946][fix] Update test model by @crazydemo in https://github.com/NVIDIA/TensorRT-LLM/pull/8993
[TRTLLM-9213][infra] Fix boost issue by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/9005
[None][doc] Replace the relative links with absolute links in README.md. by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/8997
[https://nvbugs/5575920][fix] Fix cublas/cublasLt handle creation memory not sufficient error by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/8900
[None][infra] Waive failed tests for release branch 11/07 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/9026
[None][chore] Lock onnx version <1.20.0 and remove WAR for TRT 10.13 by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/9007
[TRTLLM-9073][doc] Add the missing content for model support section and fix… by @nv-guomingz in https://github.com/NVIDIA/TensorRT-LLM/pull/9033
[TRTLLM-9080][infra] upgrade tritonserver DLFW 25.10 by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/8877
[https://nvbugs/5608743][chore] unwaive test by @reasonsolo in https://github.com/NVIDIA/TensorRT-LLM/pull/8994
[https://nvbugs/5284463][fix] fix ada fp8 group gemm lacks shared memory by @inocsin in https://github.com/NVIDIA/TensorRT-LLM/pull/9044
[https://nvbugs/5570575][fix] : Use less kv cache memory on SM120 by @peaceh-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/9054
[https://nvbugs/5628952][fix] avoid cudaFree overlap with cuda graph by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/8903
[https://nvbugs/5628204][fix] Stop token IDs - fast path optimization for single stop token IDs only by @moraxu in https://github.com/NVIDIA/TensorRT-LLM/pull/9014
[TRTLLM-7971][doc] Doc update for multimodal in v1.1 by @chang-l in https://github.com/NVIDIA/TensorRT-LLM/pull/9015
[https://nvbugs/5652552][fix] Log the llm args by @leslie-fang25 in https://github.com/NVIDIA/TensorRT-LLM/pull/9119
[https://nvbugs/5643814] [fix] Disable UCC as WAR to MPI allgather issue before NGC PyTorch 25.12 upgrade by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/9127
[https://nvbugs/5568836][fix] Skip keyword matching for Gemma3 e2e test by @brb-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/9158
[TRTLLM-9159][doc] Add KV Connector docs by @Shunkangz in https://github.com/NVIDIA/TensorRT-LLM/pull/9043
[https://nvbugs/5649826][fix] Unwaive test test_llm_commandr_plus_4gpus_summary by @sunnyqgg in https://github.com/NVIDIA/TensorRT-LLM/pull/9201
[None][fix] Bypass key-word matching for multimodal tests by @Wanli-Jiang in https://github.com/NVIDIA/TensorRT-LLM/pull/9170
[https://nvbugs/5582133][fix] unwaive nixl test by @chuangz0 in https://github.com/NVIDIA/TensorRT-LLM/pull/9244
[https://nvbugs/5461796][fix] Unwaive and extend time for test_llmapi_speculative_decoding_mtp by @sunnyqgg in https://github.com/NVIDIA/TensorRT-LLM/pull/9092
[TRTLLM-9092][doc] Add a pre-quantized example in quick start guide by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/9223
[https://nvbugs/5648685][fix] Fix openAI server waiting time to avoid large model weight loading out time by @dominicshanshan in https://github.com/NVIDIA/TensorRT-LLM/pull/9254
[https://nvbugs/5670793][fix] Solve trtllm-serve launch_disaggregated… by @JunyiXu-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/9324
[https://nvbugs/5601682][fix] Fix cacheTransceiver hang by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/9311
[https://nvbugs/5545522][fix] Correct Cutlass with PDL support by @liji-nv in https://github.com/NVIDIA/TensorRT-LLM/pull/9335
[TRTLLM-9199][docs] KV Connector Docs by @jthomson04 in https://github.com/NVIDIA/TensorRT-LLM/pull/9325
[https://nvbugs/5676748][fix] Cherry-pick [#9336]: Fix mismatched nvfp4 gemm sf shape. by @hyukn in https://github.com/NVIDIA/TensorRT-LLM/pull/9437
[TRTLLM-9160][doc] add doc to llm_runtime.py by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/9482
[None][doc] VDR 1.0 trtllm-serve doc enhancement by @LinPoly in https://github.com/NVIDIA/TensorRT-LLM/pull/9443
[TRTLLM-9086][doc] Clean up TODOs in documentation by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/9292
[TRTLLM-9157][doc] Guided decoding doc improvement by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/9359
[https://nvbugs/5687820][fix] Remove self.abort() in DetokenizedGenerationResult by @syuoni in https://github.com/NVIDIA/TensorRT-LLM/pull/9450
[None][infra] Updated Linux installation guide by @yiqingy0 in https://github.com/NVIDIA/TensorRT-LLM/pull/9485
[None][infra] Waive failed tests for release branch on 11/30 by @EmmaQiaoCh in https://github.com/NVIDIA/TensorRT-LLM/pull/9553
[TRTLLM-9075][doc] refine the slurm examples by @Superjomn in https://github.com/NVIDIA/TensorRT-LLM/pull/9548
[TRTLLM-9093][doc] update hyper links in overview by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/9568
[TRTLLM-9092][doc] link to modelopt checkpoints in quick start guide by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/9571
[None][chore] cherry-pick: Design diagram review process change (#8748) by @yibinl-nvidia in https://github.com/NVIDIA/TensorRT-LLM/pull/9596
[TRTLLM-9090] [doc] Update online benchmarking docs by @kaiyux in https://github.com/NVIDIA/TensorRT-LLM/pull/9611
[None][infra] add attribution files for release/1.1 by @yuanjingx87 in https://github.com/NVIDIA/TensorRT-LLM/pull/9495
[TRTLLM-9082][doc] Address Dynamo Example feedback by @Tabrizian in https://github.com/NVIDIA/TensorRT-LLM/pull/9619
[https://nvbugs/5652552][fix] cherry-pick add printing for llm args by @ruodil in https://github.com/NVIDIA/TensorRT-LLM/pull/9206
[TRTLLM-4629][doc] Add B300 & GB300 in documents by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/9663
[TRTLLM-9124][infra] Modify the requirement of tensorrt from 10.13.0 to 10.13.3 by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/9128
[https://nvbugs/5503138] [fix] Remove compile warnings by @VALLIS-NERIA in https://github.com/NVIDIA/TensorRT-LLM/pull/9733
[https://nvbugs/5537738][fix] Add fp8 post-quant allgather support to release 1.1 by @ChristinaZ in https://github.com/NVIDIA/TensorRT-LLM/pull/8322
[IB-1920][doc] Update Perf_Overview.md with Benchmarking Results for Release 1.1 by @zbpatel in https://github.com/NVIDIA/TensorRT-LLM/pull/9723
[None][doc] Update release notes by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/9739
[TRTLLM-9811][infra] Update urllib3 version >= 2.6.0 to fix high vulnerability issue by @ZhanruiSunCh in https://github.com/NVIDIA/TensorRT-LLM/pull/9824
[https://nvbugs/5729847][doc] fix broken links to modelopt by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/9868
[None][doc] remove nano-vl-v2 model support in release notes by @QiJune in https://github.com/NVIDIA/TensorRT-LLM/pull/9887
[None][chore] Upgrade starlette and FastAPI (#9319) by @chzblych in https://github.com/NVIDIA/TensorRT-LLM/pull/9904

New Contributors

@chenopis made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/6531
@hcyezhang made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/5785
@qianbiaoxiang made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/5521
@yuhyao made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7072
@gracehonv made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7020
@nvamyt made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7200
@Maurits-de-Groot made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7260
@aalanwyr made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7284
@AndyDai-nv made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7137
@sorenwu made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7502
@therealnaveenkamal made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7490
@asrivas made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/6495
@macrocell made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7216
@PeganovAnton made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/7935
@inocsin made their first contribution in https://github.com/NVIDIA/TensorRT-LLM/pull/9044

Full Changelog: https://github.com/NVIDIA/TensorRT-LLM/compare/v1.0.0...v1.1.0

Source: README.md, updated 2025-12-11

TensorRT LLM Files

TensorRT LLM provides users with an easy-to-use Python API

Know Issue

Key Features and Enhancements

What's Changed

New Contributors

TensorRT LLM Files

TensorRT LLM provides users with an easy-to-use Python API

Get an email when there's a new version of TensorRT LLM

Know Issue

Key Features and Enhancements

What's Changed

New Contributors