The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
onnxruntime-win-x64-gpu-1.26.0.zip	2026-05-08	300.2 MB	2
onnxruntime-win-x64-gpu_cuda13-1.26.0.zip	2026-05-08	308.6 MB	8
onnxruntime-win-x64-1.26.0.zip	2026-05-08	75.7 MB	4
onnxruntime-win-arm64x-1.26.0.zip	2026-05-08	120.2 MB	0
onnxruntime-win-arm64-1.26.0.zip	2026-05-08	77.5 MB	1
onnxruntime-osx-arm64-1.26.0.tgz	2026-05-08	31.7 MB	0
onnxruntime-linux-x64-gpu-1.26.0.tgz	2026-05-08	225.1 MB	0
onnxruntime-linux-x64-gpu_cuda13-1.26.0.tgz	2026-05-08	196.4 MB	0
onnxruntime-linux-x64-1.26.0.tgz	2026-05-08	8.6 MB	1
onnxruntime-linux-aarch64-1.26.0.tgz	2026-05-08	7.6 MB	0
1.26.0 source code.tar.gz	2026-05-04	285.1 MB	0
1.26.0 source code.zip	2026-05-04	291.5 MB	0
README.md	2026-05-04	10.4 kB	0
Totals: 13 Items		1.9 GB	16

n.b. The following was generated via LLM from Git history. Only the contributor list has been verified.

ONNX Runtime Release 1.26.0

Announcement - Breaking Changes

Support for CUDA 12 will be removed in 1.27.0.
CUDA 13 will continue to be published as onnxruntime-<os>-<arch>-gpu_cuda13-<version>.<ext>
CUDA runtime will be moving soon to a dedicated Execution Provider (EP) instead of a published package from ORT core.

Replaced unrestricted Python setattr configuration with an allowlist (#28083).
Hardened multiple OOB and overflow scenarios across ML and core ops:
Attention mask index OOB write (#27789).
MaxPoolGrad indices bounds validation (#27903).
SVM and TreeEnsemble bounds/security fixes (#27950, #27951, #27952, #27989).
RNN sequence_lens OOB read and integer overflow handling (#28052, #28003).
GroupQueryAttention seqlens_k bounds validation and compatibility follow-up (#28031, #28259).
MatMulBnb4 and ML coefficient SafeInt checks (#27995, #28001).
CUDA Gather int32 overflow fix (#28108).
GridSample float->int64 cast hardening for NaN/Inf/out-of-range coords (#28302).
Fixed session logger use-after-free during EP teardown under verbose logging (#28274).

Filled CUDA opset/operator gaps and extended support:
Transpose opset 23 -> 25 (#27740).
QuantizeLinear/DequantizeLinear opset 25 (#28046).
CUDA TopK INT8/INT16/UINT8 support (#27862).
LabelEncoder CUDA support for numeric types (#28045).
Attention/GQA improvements:
Fixed ONNX Attention min-bias alignment crash on SM<80 and masked-batch NaN behavior (#27831).
Added FP32 QK accumulation path for unfused GQA attention (#28198).
Added CUDART_VERSION reduction compatibility in GQA attention (#28296).
Fixed CUDA 13 build error in GQA unfused attention (#28309).
PagedAttention fallback for SM<80 fp16 (#28200).
MLAS updates:
FP16 Gelu enablement (#26815).
Arm64 BF16 fast-math conv kernels for NCHW/NCHWc paths (#27878).

CUDA plugin EP:
Graph capture/replay support ported and expanded (#27958, #28002).
Sync support for IOBinding (#27919).
Profiling API implementation (#28216).
Resource accounting integration (#28028).
WebGPU plugin EP:
Pipeline updates and API init error handling fixes (#28121, #28211).
Other EP updates:
CoreML: HardSigmoid and QuickGelu support; Pad reflect support/fixes (#28182, #28184, #28073, #28062).
NvTensorRTRTX compatibility and diagnostics updates (#28263, #27577).
QNN file-mapping guard improvements (#27871).

Source: README.md, updated 2026-05-04