Download Latest Version onnxruntime-linux-x64-gpu_cuda13-1.26.0.tgz (196.4 MB)
Email in envelope

Get an email when there's a new version of ONNX Runtime

Home / v1.26.0
Name Modified Size InfoDownloads / Week
Parent folder
onnxruntime-win-x64-gpu-1.26.0.zip 2026-05-08 300.2 MB
onnxruntime-win-x64-gpu_cuda13-1.26.0.zip 2026-05-08 308.6 MB
onnxruntime-win-x64-1.26.0.zip 2026-05-08 75.7 MB
onnxruntime-win-arm64x-1.26.0.zip 2026-05-08 120.2 MB
onnxruntime-win-arm64-1.26.0.zip 2026-05-08 77.5 MB
onnxruntime-osx-arm64-1.26.0.tgz 2026-05-08 31.7 MB
onnxruntime-linux-x64-gpu-1.26.0.tgz 2026-05-08 225.1 MB
onnxruntime-linux-x64-gpu_cuda13-1.26.0.tgz 2026-05-08 196.4 MB
onnxruntime-linux-x64-1.26.0.tgz 2026-05-08 8.6 MB
onnxruntime-linux-aarch64-1.26.0.tgz 2026-05-08 7.6 MB
1.26.0 source code.tar.gz 2026-05-04 285.1 MB
1.26.0 source code.zip 2026-05-04 291.5 MB
README.md 2026-05-04 10.4 kB
Totals: 13 Items   1.9 GB 16

n.b. The following was generated via LLM from Git history. Only the contributor list has been verified.

ONNX Runtime Release 1.26.0

Announcement - Breaking Changes

  • Support for CUDA 12 will be removed in 1.27.0.
  • CUDA 13 will continue to be published as onnxruntime-<os>-<arch>-gpu_cuda13-<version>.<ext>
  • CUDA runtime will be moving soon to a dedicated Execution Provider (EP) instead of a published package from ORT core.

Highlights

  • Added optional memory mapping for .ort model loads (#28164).
  • Added RISC-V Vector (RVV) support for CPU EP (#28261).
  • OpenVINO EP upgraded for 1.26.0 development release (#28297).
  • WebGPU gained GridSample support (#28264) and Split-K improvements (#28151).
  • CUDA plugin EP gained graph support (#28002), profiling API (#28216).

Security and Reliability Hardening

  • Replaced unrestricted Python setattr configuration with an allowlist (#28083).
  • Hardened multiple OOB and overflow scenarios across ML and core ops:
  • Attention mask index OOB write (#27789).
  • MaxPoolGrad indices bounds validation (#27903).
  • SVM and TreeEnsemble bounds/security fixes (#27950, #27951, #27952, #27989).
  • RNN sequence_lens OOB read and integer overflow handling (#28052, #28003).
  • GroupQueryAttention seqlens_k bounds validation and compatibility follow-up (#28031, #28259).
  • MatMulBnb4 and ML coefficient SafeInt checks (#27995, #28001).
  • CUDA Gather int32 overflow fix (#28108).
  • GridSample float->int64 cast hardening for NaN/Inf/out-of-range coords (#28302).
  • Fixed session logger use-after-free during EP teardown under verbose logging (#28274).

CUDA, Attention, and MLAS

  • Filled CUDA opset/operator gaps and extended support:
  • Transpose opset 23 -> 25 (#27740).
  • QuantizeLinear/DequantizeLinear opset 25 (#28046).
  • CUDA TopK INT8/INT16/UINT8 support (#27862).
  • LabelEncoder CUDA support for numeric types (#28045).
  • Attention/GQA improvements:
  • Fixed ONNX Attention min-bias alignment crash on SM<80 and masked-batch NaN behavior (#27831).
  • Added FP32 QK accumulation path for unfused GQA attention (#28198).
  • Added CUDART_VERSION reduction compatibility in GQA attention (#28296).
  • Fixed CUDA 13 build error in GQA unfused attention (#28309).
  • PagedAttention fallback for SM<80 fp16 (#28200).
  • MLAS updates:
  • FP16 Gelu enablement (#26815).
  • Arm64 BF16 fast-math conv kernels for NCHW/NCHWc paths (#27878).

WebGPU, WebNN, and JavaScript

  • WebGPU feature and correctness updates:
  • Added GridSample (#28264).
  • Split-K support for batch size > 1 (#28151).
  • MatMulNBits refactor and batching improvements (#28109, #28197).
  • MHA correctness fix when present outputs are not requested (#28027).
  • Buffer upload overflow fix (#27948).
  • Position ID bounds validation in WebGPU/JS RotaryEmbedding (#28214).
  • WebNN change:
  • Renamed pool2d property roundingType -> outputShapeRounding (#28172).
  • JavaScript ecosystem maintenance:
  • Multiple dependency bumps.

Plugin EP and EP Ecosystem

  • CUDA plugin EP:
  • Graph capture/replay support ported and expanded (#27958, #28002).
  • Sync support for IOBinding (#27919).
  • Profiling API implementation (#28216).
  • Resource accounting integration (#28028).
  • WebGPU plugin EP:
  • Pipeline updates and API init error handling fixes (#28121, #28211).
  • Other EP updates:
  • CoreML: HardSigmoid and QuickGelu support; Pad reflect support/fixes (#28182, #28184, #28073, #28062).
  • NvTensorRTRTX compatibility and diagnostics updates (#28263, #27577).
  • QNN file-mapping guard improvements (#27871).

Contributors

@tianleiwu, @yuslepukhin, @edgchen1, @vraspar, @hariharans29, @skottmckay, @eserscor, @xadupre, @sanaa-hamel-microsoft, @claude, @elwhyjay, @Rishi-Dave, @titaiwangms, @adrianlizarraga, @jatinwadhwa921, @jchen10, @Jiawei-Shao, @maxwbuckley, @preetha-intel, @qjia7, @qti-hungjuiw, @RajeevSekar, @umangb-09, @adrastogi, @akote123, @amd-genmingz, @ankitm3k, @apsonawane, @bachelor-dou, @baijumeswani, @bopeng1234, @chilo-ms, @chwarr, @Craigacp, @dccarmo, @derdeljan-msft, @ericcraw, @fdwr, @fs-eire, @gaugarg-nv, @gblong1, @GopalakrishnanN, @Honry, @intbf, @ishwar-raut1, @Jaswanth51, @javier-intel, @JonathanC-ARM, @julia-thorn, @justinchuby, @jwludzik, @Kevin-Taha, @Kotomi-Du, @MayureshV1, @mdvoretc-intel, @miaobin, @milpuz01, @mingyueliuh, @mklimenk, @n1harika, @prathikr, @psakhamoori, @qti-yuduo, @quic-calvnguy, @RyanMetcalfeInt8, @sfatimar, @sgbihu, @ShirasawaSama, @ssam18, @susbhere, @sushraja-msft, @TejalKhade28, @theHamsta, @TomCrypto, @TsofnatMaman, @velonica0, @vthaniel, @wenqinI, @xhan65, @xhcao

Source: README.md, updated 2026-05-04