Download Latest Version Release v3.9.0 source code.tar.gz (10.6 MB)
Email in envelope

Get an email when there's a new version of IREE

Home / v3.9.0
Name Modified Size InfoDownloads / Week
Parent folder
iree_base_runtime-3.9.0-cp313-cp313t-manylinux_2_28_x86_64.whl 2025-11-25 8.1 MB
iree_base_runtime-3.9.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 8.2 MB
iree_base_runtime-3.9.0-cp313-cp313-win_amd64.whl 2025-11-25 5.7 MB
iree_base_runtime-3.9.0-cp313-cp313-manylinux_2_28_x86_64.whl 2025-11-25 8.1 MB
iree_base_runtime-3.9.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 8.2 MB
iree_base_runtime-3.9.0-cp313-cp313-macosx_13_0_universal2.whl 2025-11-25 3.9 MB
iree_base_runtime-3.9.0-cp312-cp312-win_amd64.whl 2025-11-25 5.7 MB
iree_base_runtime-3.9.0-cp312-cp312-manylinux_2_28_x86_64.whl 2025-11-25 8.1 MB
iree_base_runtime-3.9.0-cp312-cp312-macosx_13_0_universal2.whl 2025-11-25 3.9 MB
iree_base_runtime-3.9.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 8.2 MB
iree_base_runtime-3.9.0-cp311-cp311-win_amd64.whl 2025-11-25 5.7 MB
iree_base_runtime-3.9.0-cp311-cp311-manylinux_2_28_x86_64.whl 2025-11-25 8.1 MB
iree_base_runtime-3.9.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 8.2 MB
iree_base_runtime-3.9.0-cp311-cp311-macosx_13_0_universal2.whl 2025-11-25 3.9 MB
iree_base_runtime-3.9.0-cp310-cp310-manylinux_2_28_x86_64.whl 2025-11-25 8.1 MB
iree_base_runtime-3.9.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 8.2 MB
iree_base_runtime-3.9.0-cp39-cp39-manylinux_2_28_x86_64.whl 2025-11-25 8.1 MB
iree_base_runtime-3.9.0-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 8.2 MB
iree_base_compiler-3.9.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 83.7 MB
iree_base_compiler-3.9.0-cp313-cp313-win_amd64.whl 2025-11-25 54.0 MB
iree_base_compiler-3.9.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-11-25 84.5 MB
iree_base_compiler-3.9.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 83.7 MB
iree_base_compiler-3.9.0-cp313-cp313-macosx_13_0_universal2.whl 2025-11-25 68.5 MB
iree_base_compiler-3.9.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-11-25 84.5 MB
iree_base_compiler-3.9.0-cp312-cp312-win_amd64.whl 2025-11-25 54.0 MB
iree_base_compiler-3.9.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-11-25 84.5 MB
iree_base_compiler-3.9.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 83.7 MB
iree_base_compiler-3.9.0-cp312-cp312-macosx_13_0_universal2.whl 2025-11-25 68.5 MB
iree_base_compiler-3.9.0-cp311-cp311-win_amd64.whl 2025-11-25 54.0 MB
iree_base_compiler-3.9.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-11-25 84.5 MB
iree_base_compiler-3.9.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 83.7 MB
iree_base_compiler-3.9.0-cp311-cp311-macosx_13_0_universal2.whl 2025-11-25 68.5 MB
iree_base_compiler-3.9.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-11-25 84.5 MB
iree_base_compiler-3.9.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 83.7 MB
iree_base_compiler-3.9.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-11-25 84.5 MB
iree_base_compiler-3.9.0-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl 2025-11-25 83.7 MB
iree_tools_tflite-20251125.1456-py3-none-any.whl 2025-11-25 3.6 kB
iree_tools_tf-20251125.1456-py3-none-any.whl 2025-11-25 32.6 kB
README.md 2025-11-25 41.6 kB
Release v3.9.0 source code.tar.gz 2025-11-25 10.6 MB
Release v3.9.0 source code.zip 2025-11-25 14.2 MB
Totals: 41 Items   1.5 GB 0

IREE Release v3.9.0

1. Compiler

1.1 Data Tiling & GEMM Improvements

  • iree-opt-data-tiling promoted to umbrella flag with suggested config. (#22295)
  • Default path switched to DispatchCreation phase; use --iree-global-opt-data-tiling for legacy behavior. See docs. (#21441)
  • Implemented subgroups_k in data-tiled MMA layouts. (#22519)
  • Added per-operand M/N/K interleaving control. (#22626)
  • Added layout transfer support in MaterializeEncoding. (#22582)
  • Strict inner_tiled verifier with distributed/opaque params. (#22369)
  • Unified encoding materialization passes. (#22472)
  • Encoding op fusion with multi-use producers at -O3. (#22444)
  • Intentional padding for non-K-major layouts (~2.7% GEMM improvement). (#22486)
  • Better heuristics for extremely large GEMMs. (#22636)
  • Refactored narrow matmul tile size selection. (#22177)
  • Split reduction for large-K GEMMs. (#22357)
  • Updated ukernel data layout. (#22350)
  • Fixed large f16 ukernel bounds. (#22481)
  • Added LLaMA 8B FP8 benchmark tests on gfx942. (#22387)

1.2 Dispatch Creation

  • Added split-reduction support for arg_compare, preventing shared-memory overflow and fixing LLaMA 8B FP16 compilation failures. (#22466)
  • Added aggressive multi-use fusion for encoding ops (enabled at -O3), significantly improving fusion patterns seen in SDXL. (#22444)
  • Enabled consumer fusion for GPUApplyTilingLevel on scf.forall loops, enhancing padding-level fusion. (#22522)

1.3 GPU Codegen

  • Added barrier insertion before first shared-memory write for AMD GPUs, fixing non-deterministic strided conv results (13% -> 0% failure rate). (#22669)
  • Rewrote loop prefetcher with a stage-based backward slicing model for better maintainability (no functional change). (#22605)
  • Implemented vector size inference for UKernelGenericOp, enabling downstream ops (e.g., unpack) to correctly vectorize instead of falling back to scalar code. (#22440)
  • Improved f16 medium ukernel bounds on ROCm for better matmul throughput. (#22393)
  • Added mmt4d ukernel support for RISC-V zvfh/zvfhmin, enabling f16xf16->f16/f32 kernels with runtime hardware probing. (#22231)
  • Generalized GPU lowering for linalg.reduce ops, converting illegal i1 reductions to generic form to unblock split-reduction pipelines. (#22490)

1.4 Others

  • Interfaces, Layouts & IR Improvements (#22467, [#22390], [#22368])
  • Various correctness and quality improvements across codegen, layout propagation, and GPU lowering. (#22636, [#22490], [#22466], [#22669], [#22522], [#22605], [#22486], [#22519], [#22444], [#22393], [#22231], [#22467], [#22390], [#22368], [#22440], [#22598])
  • Exposed C and Python bindings for IGEMM convolution details (#22598)

2. Runtime

  • Implemented the first end-to-end support for external transients, enabling early—but functional—handling of control flow and cross-dispatch transient values.
    • Current limitations: no function calls and no data-dependent values; simple control flow is supported and aligns with future dispatch specialization work. (#22625)
  • Added timeline-aware async execution across module boundaries, introducing foundational interfaces for precise cross-module scheduling. (#22381)
  • Improved support for iree_codegen.extract_strided_metadata, ensuring information-preserving lowering:
    • Now normalizes into iree_codegen earlier, avoiding loss of stride/offset/alignment information that occurred when prematurely converting to memref. (#22606)
  • Added new Stream canonicalizations and improved RefineUsage to reduce unnecessary copies and fix correctness bugs. (#22610)
  • Added --gen-dialect-json to iree-tblgen, generating JSON databases of dialect definitions using tablegen metadata. (#22603)

Change Log

Git History

## What's Changed * [LinalgExt] Don't vectorize map_scatter in non-contiguous sub-byte access by @jtuyls in https://github.com/iree-org/iree/pull/22242 * [python] Set up binding for preprocessing transform ops by @bangtianliu in https://github.com/iree-org/iree/pull/22227 * Re-enable lds_barrier on RDNA4 by @krzysz00 in https://github.com/iree-org/iree/pull/21922 * [CI][iree-test-suites] Try to make torch_models benchmarks more stable by @Groverkss in https://github.com/iree-org/iree/pull/22271 * Reapply "[GPU] Allow multi result and indexing compute generic ops in TilleAndFuse pipeline" (#22205)" by @nirvedhmeshram in https://github.com/iree-org/iree/pull/22223 * Reapply "[Dispatch Creation] Rework dispatch formation logic (#21854)" by @IanWood1 in https://github.com/iree-org/iree/pull/22065 * [debugging][gpu] Add --iree-hip-emit-debug-info flag by @willghatch in https://github.com/iree-org/iree/pull/22216 * [Codegen] Update the td spec using the contraction matcher op by @bangtianliu in https://github.com/iree-org/iree/pull/22249 * [Codegen] Update the td spec using the attention matcher op by @bangtianliu in https://github.com/iree-org/iree/pull/22266 * Revert "Re-enable lds_barrier on RDNA4" by @kuhar in https://github.com/iree-org/iree/pull/22278 * Integrate llvm/llvm-project@b92483c by @newling in https://github.com/iree-org/iree/pull/22274 * Support skinny scaled matmul in kernel config by @jtuyls in https://github.com/iree-org/iree/pull/22042 * Use llvm wrappers for accumulate. NFC. by @kuhar in https://github.com/iree-org/iree/pull/22279 * [NFC][GPU] Move reduction configuration to gpu utilities by @Groverkss in https://github.com/iree-org/iree/pull/22286 * [GPU] Move convolution check out of unrelated function by @Groverkss in https://github.com/iree-org/iree/pull/22287 * [GPU] Support iree_tensor_ext.dispatch.tensor.store for broadcast producer by @nirvedhmeshram in https://github.com/iree-org/iree/pull/22291 * [Docs] Read from first line of `rocm_agent_enumerator` output by @sjain-stanford in https://github.com/iree-org/iree/pull/22283 * [Codegen] Adding an optional `dma_sizes` field in GPU attributes by @lialan in https://github.com/iree-org/iree/pull/22281 * Bump LLVM to llvm/llvm-project@5a636c6 by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22290 * Let MLIR ukernels provide their matching and data-tiled-layout info. by @bjacob in https://github.com/iree-org/iree/pull/22254 * [LLVMCPU] Propagate target features and CPU name to individual LLVMFuncOp by @mshockwave in https://github.com/iree-org/iree/pull/22036 * [CI][TorchModels] Update flags used for LLaMa 8b f8/fp16. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22297 * Promote iree-opt-data-tiling to pipeline options. by @hanhanW in https://github.com/iree-org/iree/pull/22295 * Bump version to 3.9.0 after 3.8.0 release. by @sa-faizal in https://github.com/iree-org/iree/pull/22308 * [GPU] Enabling Gather-like ops to go through GPUTileAndFuse pipeline by @Abhishek-Varma in https://github.com/iree-org/iree/pull/22251 * [python] Set up python binding for matcher convolution and attention op by @bangtianliu in https://github.com/iree-org/iree/pull/22311 * [DT][NFC] Trim IRs in encoding materialization tests for GPU and RISCV backends. by @hanhanW in https://github.com/iree-org/iree/pull/22313 * [GPU] Update K Tile size picking for multiple K dims by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/22310 * [codegen][gpu] Make transfer_write conditional when not fully distributed by @newling in https://github.com/iree-org/iree/pull/22198 * [Stream] Replicate globals per affinity before Stream conversion. by @hanhanW in https://github.com/iree-org/iree/pull/22117 * Fix non-deterministic hoisting by @IanWood1 in https://github.com/iree-org/iree/pull/22319 * Drop revert of [llvm/llvm-project#159083](https://github.com/llvm/llvm-project/issues/159083) by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22298 * [Codegen] Allow pre-padding other dims of a conv except the input channel by @yzhang93 in https://github.com/iree-org/iree/pull/22296 * [CI][Torch] Update dispatch counts after non-determinism fix by @Groverkss in https://github.com/iree-org/iree/pull/22333 * [Codegen] Use llvm accumulate wrappers. NFC. by @kuhar in https://github.com/iree-org/iree/pull/22331 * [Codegen] Tile memref.copy when vectorizing for dynamic dims by @jtuyls in https://github.com/iree-org/iree/pull/22168 * Reapply "Re-enable lds_barrier on RDNA4" (#22278) by @krzysz00 in https://github.com/iree-org/iree/pull/22326 * [Codegen] Handle multiple dyn dims in tensor load pattern by @IanWood1 in https://github.com/iree-org/iree/pull/22328 * [DT][NFC] Add test files for materializing IREE ops with encodings. by @hanhanW in https://github.com/iree-org/iree/pull/22322 * [DT][NFC] Trim IRs for materialize_encoding_aarch64.mlir test. by @hanhanW in https://github.com/iree-org/iree/pull/22327 * [DT][NFC] Trim unnecessary IRs for materialize_encoding_vmvx.mlir test. by @hanhanW in https://github.com/iree-org/iree/pull/22330 * [DT][NFC] Trim unnecessary IRs for materialize_encoding_x86_64.mlir test. by @hanhanW in https://github.com/iree-org/iree/pull/22332 * [DispatchCreation] Add split reduction for weight backward convs by @yzhang93 in https://github.com/iree-org/iree/pull/22275 * [Integrate] Bump LLVM to llvm/llvm-project@893b1d4 by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22334 * [DT][NFCI] Implement getOffsetsSizesStrides for GPU padding resolver. by @hanhanW in https://github.com/iree-org/iree/pull/22339 * Remove `moveCrossThreadOutermost` by @bjacob in https://github.com/iree-org/iree/pull/22284 * [Global Opt] Don't propagate edge reshapes by @IanWood1 in https://github.com/iree-org/iree/pull/22320 * [DT][NFC] Collapse MaterializeScaledContractionOp into generic pattern. by @hanhanW in https://github.com/iree-org/iree/pull/22340 * [Codegen][Tuner] Add root_op for matvec and reduction along VectorDistribute pipeline by @bangtianliu in https://github.com/iree-org/iree/pull/22348 * Catch MLIR ukernel parsing errors by @bjacob in https://github.com/iree-org/iree/pull/22353 * [ROCM][DT] Update ukernel data layout by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22350 * [GlobalOpt] Fix transpose propagation for index-semantic ops by interchanging indexing maps by @ziliangzl in https://github.com/iree-org/iree/pull/22248 * [build flags] 2nd prep to enable more warnings in compile flags (#21996) by @schuermans-roofline in https://github.com/iree-org/iree/pull/22273 * [LinalgExt] Fix scatter unique_indices when dropping unit dims by @IanWood1 in https://github.com/iree-org/iree/pull/22362 * [DT][NFC] Refactor linalg.fill/generic op lowering to interface implementation. by @hanhanW in https://github.com/iree-org/iree/pull/22343 * [DT] Mark partial slices unsupported in padding encoding resolver. by @hanhanW in https://github.com/iree-org/iree/pull/22359 * [DT] Implement LayoutMaterializerAttr for identity resolver. by @hanhanW in https://github.com/iree-org/iree/pull/22337 * [Codegen] Canonicalize loops and subviews after copy vectorization by @jtuyls in https://github.com/iree-org/iree/pull/22344 * Bump LLVM to llvm/llvm-project@c8cf393 by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/22354 * [DT] Support partial load/store for identity encoding resolver. by @hanhanW in https://github.com/iree-org/iree/pull/22360 * [Codegen] Remove batch size in target intrinsic checks by @jtuyls in https://github.com/iree-org/iree/pull/22289 * [NFC] Wrap directory structure within a block. by @hanhanW in https://github.com/iree-org/iree/pull/22373 * [DT] Support partial load/store for GPU padding encoding resolver. by @hanhanW in https://github.com/iree-org/iree/pull/22372 * [AMDGPU] Cache_swizzle stride for fat raw buffer loads should in bytes by @sebvince in https://github.com/iree-org/iree/pull/22314 * [LLVMCPU] Refactor multi lowering config propagation and setting by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22126 * [build flags] enable more warnings in compile flags (#21996) by @schuermans-roofline in https://github.com/iree-org/iree/pull/22240 * Bump LLVM to llvm/llvm-project@683e2bf by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/22366 * [NFC][ROCM] Simplify ukernel encoding materialization tests by @jtuyls in https://github.com/iree-org/iree/pull/22376 * [StableHLO] Fix reshape canonicalization for dense_resource constants. by @weidel-p in https://github.com/iree-org/iree/pull/22365 * [CI][TorchModels] Add SDXL int8 model to Torch Models CI. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22364 * [VectorDistribute] Fix transfer_write broadcasting guard by @Groverkss in https://github.com/iree-org/iree/pull/22352 * [NFC] Merge common type constraints by @krzysz00 in https://github.com/iree-org/iree/pull/22358 * [Encoding] fix dependency issues with @3815582bbd by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/22384 * [Stream] Deduplicate the dispatch workloads by @jtuyls in https://github.com/iree-org/iree/pull/22187 * [DispatchCreation] Set split reduction size for GEMM with large k dim by @yzhang93 in https://github.com/iree-org/iree/pull/22357 * Adding markAllAnalysesPreserved to verification passes. by @benvanik in https://github.com/iree-org/iree/pull/22380 * Rewriting CombineInitializersPass to not make incorrect programs. by @benvanik in https://github.com/iree-org/iree/pull/22118 * Three reverts to undo transfer_write deduplication and return to previous state by @newling in https://github.com/iree-org/iree/pull/22392 * [CI][Torch] Add llama 8b fp16 quality tests by @Groverkss in https://github.com/iree-org/iree/pull/22379 * [Codegen] Implement value bounds interface for LoadFromBufferOp by @jtuyls in https://github.com/iree-org/iree/pull/22390 * [ROCM] Improve f16 medium ukernel bounds by @jtuyls in https://github.com/iree-org/iree/pull/22393 * Add mmt4d ukernel for riscv64's zvfhmin and zvfh feature, for types f16xf16->f16/f32 by @adeel10x in https://github.com/iree-org/iree/pull/22231 * [DispatchCreation] Add clean up pattern for fusing pad into split reduction dispatch by @yzhang93 in https://github.com/iree-org/iree/pull/22398 * Add Max191 to CODEOWNERS by @Max191 in https://github.com/iree-org/iree/pull/22411 * [NFC] Replace all uses of OpBuilder.create<OpTy> with OpTy::create by @Muzammiluddin-Syed-ECE in https://github.com/iree-org/iree/pull/22406 * [ROCM][Target] Add target for Strix Halo, and Phoenix by @raikonenfnu in https://github.com/iree-org/iree/pull/22410 * [Codegen] Cleanup VectorLayoutAnalysis testing by @Groverkss in https://github.com/iree-org/iree/pull/22417 * Add final dispatch name to AMDGPU Register spill warning by @sebvince in https://github.com/iree-org/iree/pull/22407 * [LinalgExt][NFC] Split the op definition between pure ops and LinalgExt ops by @sakupan102 in https://github.com/iree-org/iree/pull/22368 * Give `inner_tiled` a strict verifier and explicit semantics with boolean parameters `distributed` and `opaque` by @bjacob in https://github.com/iree-org/iree/pull/22369 * [LinalgExt][NFC] Move AttrSizedOperandSegments from base class to individual ops by @Copilot in https://github.com/iree-org/iree/pull/22430 * Rewrite SingleSubgroupLayout documentation by @bjacob in https://github.com/iree-org/iree/pull/22412 * [Codegen][Tuner] solve name conflicts for merging td specs by @bangtianliu in https://github.com/iree-org/iree/pull/22409 * [tools] Add bash autocomplete script for iree-opt/iree-compile by @Groverkss in https://github.com/iree-org/iree/pull/22424 * Bump LLVM to llvm/llvm-project@e903494 by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22427 * [Global Opt] Raise tensor.extract to input by @IanWood1 in https://github.com/iree-org/iree/pull/22434 * [Global Opt] Add flag to control edge reshape propagation by @IanWood1 in https://github.com/iree-org/iree/pull/22438 * Adding HAL virtual memory APIs. by @benvanik in https://github.com/iree-org/iree/pull/22437 * Fix ReplicateGlobalsPerAffinity to maintain correct order of globals and initializers by @Copilot in https://github.com/iree-org/iree/pull/22401 * Update IanWood1 in CODEOWNERS by @IanWood1 in https://github.com/iree-org/iree/pull/22447 * [Codegen][ROCm] Don't branch on undef in `getPaddingConvSize` by @kuhar in https://github.com/iree-org/iree/pull/22449 * [CI][TorchModels] Update llama 8b fp16 golden time by @jtuyls in https://github.com/iree-org/iree/pull/22426 * [LLVMGPU] Fix coding standards / style issues in config utils by @kuhar in https://github.com/iree-org/iree/pull/22454 * [Codegen] Cleanup VectorLayoutAnalysis details by @Groverkss in https://github.com/iree-org/iree/pull/22418 * [Codegen] Rewrite VectorLayoutAnalysis to a simpler implementation by @Groverkss in https://github.com/iree-org/iree/pull/22420 * Bump LLVM to llvm/llvm-project@466c526 by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22450 * [Codegen] Move GPUApplyPaddingLevel to an interface implementation by @Groverkss in https://github.com/iree-org/iree/pull/22422 * [ukernels] Add missing specializations on gfx942/gfx950 and associated e2e tests by @sebvince in https://github.com/iree-org/iree/pull/22446 * [Codegen] Fix more coding style / standards issues by @kuhar in https://github.com/iree-org/iree/pull/22459 * [Codegen] Add vector size inference for ukernel operations. by @Copilot in https://github.com/iree-org/iree/pull/22440 * Migrate custom LDBG macro to LLVM’s built-in debug logging by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22456 * Adding sysfs topology detection logic and switching to it by default. by @benvanik in https://github.com/iree-org/iree/pull/22455 * Fix e2e matmul mxfp4 tests on gfx950 post [#22446] by @bjacob in https://github.com/iree-org/iree/pull/22464 * Adding SILENCE_DEPRECATIONS option to LLVM external projects cmake. by @benvanik in https://github.com/iree-org/iree/pull/22463 * [DT][NFC] Fix coding style / standards issues for encoding materialization. by @hanhanW in https://github.com/iree-org/iree/pull/22471 * [DT][NFCI] Use no-rollback driver for MaterializeEncoding passes. by @hanhanW in https://github.com/iree-org/iree/pull/22474 * Add myself to .github CODEOWNERS by @Groverkss in https://github.com/iree-org/iree/pull/22477 * Adding iree-link tool. by @benvanik in https://github.com/iree-org/iree/pull/22419 * [ci] Remove gh installation for mi325 ci by @Groverkss in https://github.com/iree-org/iree/pull/22476 * [DT] Implement MaterializeInterfaceBindingEncoding with interface methods. by @hanhanW in https://github.com/iree-org/iree/pull/22467 * [CPU] Switch IREE::CPU::TilingLevel to enum class by @Copilot in https://github.com/iree-org/iree/pull/22433 * Bump the github-actions group with 2 updates by @dependabot[bot] in https://github.com/iree-org/iree/pull/22436 * CMake: When `rocminfo` is present, ask users to explicitly enable or disable ROCm testing. by @bjacob in https://github.com/iree-org/iree/pull/22478 * [Integrate] Cherry-pick llvm/llvm-project@41f6566 by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22470 * Harmonize `*ScaledMMAAttr` operand order and drop `MMAFragment` by @bjacob in https://github.com/iree-org/iree/pull/22465 * Revert "[LLVMCPU] Propagate target features and CPU name to individual LLVMFuncOp" by @hanhanW in https://github.com/iree-org/iree/pull/22488 * Bump LLVM to llvm/llvm-project@03e66ae by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22487 * [GPU] Add serial tiling level by @Groverkss in https://github.com/iree-org/iree/pull/22479 * Add Cursor files to gitignore by @Max191 in https://github.com/iree-org/iree/pull/22469 * [compiler][nfc] Remove using-declarations pollution from headers. by @hanhanW in https://github.com/iree-org/iree/pull/22501 * [DT] Collapse MaterializeEncodingIntoPaddingPass into the generic pass. by @hanhanW in https://github.com/iree-org/iree/pull/22472 * Bump LLVM to llvm/llvm-project@09318c6 by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22494 * [CI] Run w7900 tests on any runner with two w7900 gpus by @kuhar in https://github.com/iree-org/iree/pull/22511 * [CPU][NFC] Style fixes and address post-commit comments. by @hanhanW in https://github.com/iree-org/iree/pull/22505 * [CI] Fix typo in reserved trailers by @kuhar in https://github.com/iree-org/iree/pull/22514 * [CI] Make rdna3 runner requirements more fine-grained by @kuhar in https://github.com/iree-org/iree/pull/22513 * Bump LLVM to llvm/llvm-project@04f87c693c7e by @hanhanW in https://github.com/iree-org/iree/pull/22515 * [LinalgExt] Decompose sub-byte map_scatter to extract/store by @jtuyls in https://github.com/iree-org/iree/pull/22315 * [ROCM] Update bounds for large f16 data-tiling ukernel by @jtuyls in https://github.com/iree-org/iree/pull/22481 * Remove value bounds interface for ExpandShapeOp by @jtuyls in https://github.com/iree-org/iree/pull/22460 * Revert "Three reverts to undo transfer_write deduplication and return… by @Groverkss in https://github.com/iree-org/iree/pull/22521 * [CPU][NFC] Trim IRs for lowering_config tests. (2/N) by @hanhanW in https://github.com/iree-org/iree/pull/22512 * Implement `subgroups_k` in data-tiled MMA layouts by @bjacob in https://github.com/iree-org/iree/pull/22519 * [Codegen][ROCm] Add WMMA intrinsics for gfx1250 by @kuhar in https://github.com/iree-org/iree/pull/22516 * Bump LLVM to llvm-project@6a275de13f6c by @hanhanW in https://github.com/iree-org/iree/pull/22524 * [Torch] Disable deprecation declaration warnings when building torch-mlir-dialects by @hanhanW in https://github.com/iree-org/iree/pull/22526 * [LinalgExt] Don't force MxK layout for im2col output by @Max191 in https://github.com/iree-org/iree/pull/22396 * [GPU] Clean up misc issues in IREEGPUAttrs. NFC. by @kuhar in https://github.com/iree-org/iree/pull/22531 * [Codegen][GPU] Allow intentional padding for non-K-major matmul layouts by @jerryyin in https://github.com/iree-org/iree/pull/22486 * [DispatchCreation] Enable splitting multiple reduction dimensions for weight backward convs by @yzhang93 in https://github.com/iree-org/iree/pull/22491 * [Integrate] Drop the revert of affine canonicalization commit (8c05b5cc) by @hanhanW in https://github.com/iree-org/iree/pull/22530 * [GPU] Add consumer fusion for GPUApplyTilingLevel by @Groverkss in https://github.com/iree-org/iree/pull/22522 * [CPU][NFC] Trim unnecessary IRs for CPU tests. by @hanhanW in https://github.com/iree-org/iree/pull/22546 * [DispatchCreation] Enable fusion of encoding ops with multi-use producers by @Abhishek-Varma in https://github.com/iree-org/iree/pull/22444 * [LinalgExt] Decompose map_scatter with strided rank-reducing subviews by @Max191 in https://github.com/iree-org/iree/pull/22504 * [Global Opt] Move strided contraction pass after transpose prop by @IanWood1 in https://github.com/iree-org/iree/pull/22534 * Bump LLVM to llvm/llvm-project@0ce03c2be4c4 by @hanhanW in https://github.com/iree-org/iree/pull/22550 * [Input] Add RecomposeComplexOps pass in Torch/InputConversion/Passes by @raayandhar in https://github.com/iree-org/iree/pull/22276 * Using our own tablegen with depfile support. by @benvanik in https://github.com/iree-org/iree/pull/22554 * [LinalgExt] Added TilingInterface support for ExpReductionOp by @hhkit in https://github.com/iree-org/iree/pull/22316 * Fix `iree.build` source directory being gitignore'd by @rkayaith in https://github.com/iree-org/iree/pull/22391 * [Dispatch Creation] Drop unit dims from tensor.extract ops by @IanWood1 in https://github.com/iree-org/iree/pull/22503 * [Dispatch Creation] Don't add unfusable consumers to fusion group by @IanWood1 in https://github.com/iree-org/iree/pull/22461 * Integrate torch-mlir at llvm/torch-mlir@288cd5e8adb by @IanWood1 in https://github.com/iree-org/iree/pull/22508 * [GPU][DT] Refactor tile size selection for narrow matmul by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22177 * [CI] Change numprocesses to 1 for amdgpu_vulkan_O0 by @hanhanW in https://github.com/iree-org/iree/pull/22567 * Fix BYO LLVM build: handle MLIRTargetLLVMIRImport as non-object library by @hanhanW in https://github.com/iree-org/iree/pull/22553 * Bump LLVM to llvm/llvm-project@f60e69315e9e by @hanhanW in https://github.com/iree-org/iree/pull/22565 * [CodeGen][Tuner] Add bindings to query SIMDs and CUs info by @RattataKing in https://github.com/iree-org/iree/pull/22527 * Bump spirv-cross submodule by @kuhar in https://github.com/iree-org/iree/pull/22556 * [runtime] Require aligned memory accesses by default by @kuhar in https://github.com/iree-org/iree/pull/22557 * [runtime] Simplify unaligned load/store impl for u64/f64. NFC. by @kuhar in https://github.com/iree-org/iree/pull/22570 * Update Lit test checks caused by upstream fcf79e5 by @lialan in https://github.com/iree-org/iree/pull/22480 * [Codegen] Allow iree_codegen.swizzle_hint to operate on tensors by @krzysz00 in https://github.com/iree-org/iree/pull/22552 * Bump LLVM to llvm/llvm-project@6fce53af846c by @hanhanW in https://github.com/iree-org/iree/pull/22573 * [CI] Force amdgpu_vulkan runner be shark10-ci by @hanhanW in https://github.com/iree-org/iree/pull/22580 * Example of using HalModuleDebugSink to find numerical divergence by @newling in https://github.com/iree-org/iree/pull/22535 * Enable CI for torch ops by @amd-eochoalo in https://github.com/iree-org/iree/pull/22548 * [CI][torch_ops] Force amdgpu_vulkan runner be shark10-ci by @amd-eochoalo in https://github.com/iree-org/iree/pull/22588 * [NFC] Refresh golden values for benchmarks. by @hanhanW in https://github.com/iree-org/iree/pull/22583 * [CI] Relax golden values for torch_models. by @hanhanW in https://github.com/iree-org/iree/pull/22592 * [CI] Relax golden values for torch_models more. by @hanhanW in https://github.com/iree-org/iree/pull/22593 * Fix LLD support in BYO LLVM builds by @hanhanW in https://github.com/iree-org/iree/pull/22594 * Bump LLVM to llvm/llvm-project@37403685298bd3a7 by @hanhanW in https://github.com/iree-org/iree/pull/22591 * Increase acceptable error in punet by @newling in https://github.com/iree-org/iree/pull/22169 * [CI] Refresh golden values for failing benchmarks: min(val*1.1, val+5ms) by @hanhanW in https://github.com/iree-org/iree/pull/22595 * [Codegen][GPU] Update heuristic to consider distribution from split reduction by @yzhang93 in https://github.com/iree-org/iree/pull/22575 * [CI] Force CPU torch benchmarks to use Threadripper. by @hanhanW in https://github.com/iree-org/iree/pull/22600 * Adding new .td metadata classes and making our defs consistent. by @benvanik in https://github.com/iree-org/iree/pull/22569 * [Codegen][GPU] Introduce scf::pipelineForLoop function from upstream for prefetchSharedMemory pass by @jerryyin in https://github.com/iree-org/iree/pull/22523 * Adding iree_hal_executable_cache_infer_format. by @benvanik in https://github.com/iree-org/iree/pull/21763 * Adding timeline-aware async execution across module boundaries. by @benvanik in https://github.com/iree-org/iree/pull/22381 * [NFC] Renaming `stream.parameter.*` to `stream.cmd.parameter.*`. by @benvanik in https://github.com/iree-org/iree/pull/22607 * Adding --gen-dialect-json to iree-tblgen. by @benvanik in https://github.com/iree-org/iree/pull/22603 * Integrate llvm 2025-11-10 by @nirvedhmeshram in https://github.com/iree-org/iree/pull/22608 * [CI] Update clip benchmark by @nirvedhmeshram in https://github.com/iree-org/iree/pull/22612 * [Codegen][Tuner] Extend ireeGPUTargetInfo constructor with new added attributes by @RattataKing in https://github.com/iree-org/iree/pull/22597 * [TensorExt] Add barrier ops and roundtrip tests 1/2 by @IanWood1 in https://github.com/iree-org/iree/pull/22577 * Improving support for iree_codegen.extract_strided_metadata. by @benvanik in https://github.com/iree-org/iree/pull/22606 * Integrates/llvm 2025-11-10 (part 2) by @nirvedhmeshram in https://github.com/iree-org/iree/pull/22613 * [PJRT] Update rocm pjrt by @castigli in https://github.com/iree-org/iree/pull/22317 * Update split reduction cutoff conditions by @yzhang93 in https://github.com/iree-org/iree/pull/22596 * Bump the github-actions group with 2 updates by @dependabot[bot] in https://github.com/iree-org/iree/pull/22614 * Integrates/llvm 20251112 by @nirvedhmeshram in https://github.com/iree-org/iree/pull/22624 * [Stream] Fixing update order and improving the cache for ReplicateGlobalsPerAffinity pass. by @hanhanW in https://github.com/iree-org/iree/pull/22499 * Add passes to insert and remove barriers 2/2 by @IanWood1 in https://github.com/iree-org/iree/pull/22566 * [TensorExt] Rename barrier to compute_barrier by @IanWood1 in https://github.com/iree-org/iree/pull/22627 * [DT] Add support for layout transfer in MaterializeEncoding pass. by @hanhanW in https://github.com/iree-org/iree/pull/22582 * [e2e] Use remarks to verify ukernel match by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22620 * [runtime] Add explicit casts to char* to silence ubsan warnings by @kuhar in https://github.com/iree-org/iree/pull/22628 * [docs] Fix a typo in LinalgExtOps.td by @sakupan102 in https://github.com/iree-org/iree/pull/22633 * Fix mixed precision operands in splitReduction pass by @FlintWangacc in https://github.com/iree-org/iree/pull/22138 * [TensorExt] Add folder for barrier ops by @IanWood1 in https://github.com/iree-org/iree/pull/22616 * [Codegen][Tuner] expose python binding for getIGEMMGenericConvDetails by @bangtianliu in https://github.com/iree-org/iree/pull/22598 * [runtime] Fix incorrect alignment assumptions by @kuhar in https://github.com/iree-org/iree/pull/22571 * [LLVMCPU] Support tile-and-fuse anchoring on producer ops by @hanhanW in https://github.com/iree-org/iree/pull/22632 * Silence remaining UBSan warnings across runtime and spirv-cross by @kuhar in https://github.com/iree-org/iree/pull/22638 * Bump torch-mlir to llvm/torch-mlir@8d563af0b68 by @hanhanW in https://github.com/iree-org/iree/pull/22637 * [Codegen][GPU] Replace prefetchLoop with stage-based backward slicing by @jerryyin in https://github.com/iree-org/iree/pull/22605 * [CI] Optimize and clean up asan and tsan build scripts by @kuhar in https://github.com/iree-org/iree/pull/22639 * [VMVX][NFC] Trim unnecessary IRs from select_lowering_strategy.mlir by @hanhanW in https://github.com/iree-org/iree/pull/22641 * [DT] Allow to enable/disable interleaving separately for M/N/K dimensions, for each operand by @bjacob in https://github.com/iree-org/iree/pull/22626 * [DataTiling] Switch default to start from the DispatchCreation phase. by @hanhanW in https://github.com/iree-org/iree/pull/21441 * [Flow] Move ReplicateGlobalsPerAffinity pass to Flow by @sommerlukas in https://github.com/iree-org/iree/pull/22634 * [SPIRV][NFC] Simplify lowering strategy tests by removing unnecessary IRs by @hanhanW in https://github.com/iree-org/iree/pull/22648 * Bump llvm to llvm/llvm-project@7b7a422 by @nirvedhmeshram in https://github.com/iree-org/iree/pull/22635 * Use llvm cast function objects. NFC. by @kuhar in https://github.com/iree-org/iree/pull/22652 * Drop unnecessary namespaces from cast functions in plugins. NFC. 1/10 by @kuhar in https://github.com/iree-org/iree/pull/22653 * Drop unnecessary namespaces from cast functions in bindings/dispatch/external. NFC. 2/10 by @kuhar in https://github.com/iree-org/iree/pull/22654 * Drop unnecessary namespaces from cast functions in codegen common. NFC. 3/10 by @kuhar in https://github.com/iree-org/iree/pull/22655 * Drop unnecessary namespaces from cast functions in codegen backends. NFC. 4/10 by @kuhar in https://github.com/iree-org/iree/pull/22656 * Drop unnecessary namespaces from cast functions in dialect flow *ext. NFC. 7/10 by @kuhar in https://github.com/iree-org/iree/pull/22659 * Drop unnecessary namespaces from cast functions in dialect util. NFC. 8/10 by @kuhar in https://github.com/iree-org/iree/pull/22660 * Drop unnecessary namespaces from cast functions in dialect stream. NFC. 9/10 by @kuhar in https://github.com/iree-org/iree/pull/22661 * Drop unnecessary namespaces from cast functions in dialect vm vmvx etc. NFC. 10/10 by @kuhar in https://github.com/iree-org/iree/pull/22662 * Drop unnecessary namespaces from cast functions in dialect hal encoding. NFC. 6/10 by @kuhar in https://github.com/iree-org/iree/pull/22658 * Drop unnecessary namespaces from cast functions in codegen dialect utils. NFC. 5/10 by @kuhar in https://github.com/iree-org/iree/pull/22657 * [LLVMGPU][NFC] Simplify lowering_config tests. 1/N by @hanhanW in https://github.com/iree-org/iree/pull/22665 * Partial Revert "[e2e] Use remarks to verify ukernel match" by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22647 * [CI] Optimize cmake flags for debug info builds by @kuhar in https://github.com/iree-org/iree/pull/22651 * [CI] Add ubsan build and test script. Run ubsan tests in CI. by @kuhar in https://github.com/iree-org/iree/pull/22650 * [AMD][GPU] Insert barrier in prologue before first shared memory write by @jerryyin in https://github.com/iree-org/iree/pull/22669 * [NFC] Switch to dyn_cast_if_present for consistency. by @hanhanW in https://github.com/iree-org/iree/pull/22670 * Update split reduction heuristic for extreme large GEMMs by @yzhang93 in https://github.com/iree-org/iree/pull/22636 * [Integrate] Bump torch-mlir to llvm/torch-mlir@a2bcca0f025bf0 by @hanhanW in https://github.com/iree-org/iree/pull/22680 * Suppress ROCm lsan errors in HIP driver tests by @qedawkins in https://github.com/iree-org/iree/pull/22675 * Update`coalesced_gather_dma` definitions by @lialan in https://github.com/iree-org/iree/pull/22294 * [Codegen][GPU] Add configurable num-stages option to prefetch pass by @jerryyin in https://github.com/iree-org/iree/pull/22673 * RHS type should be used by @NoumanAmir657 in https://github.com/iree-org/iree/pull/22686 * Drop prefetches in AVX512 ukernels by @bjacob in https://github.com/iree-org/iree/pull/22668 * Bump actions/checkout from 5.0.0 to 5.0.1 in the github-actions group by @dependabot[bot] in https://github.com/iree-org/iree/pull/22677 * [tuner][docs] update sharktuner readme by @bangtianliu in https://github.com/iree-org/iree/pull/22683 * Revert "[LDS] Lower to `coalesced_gather_dma` (#22294)" by @lialan in https://github.com/iree-org/iree/pull/22691 * Relax assert in task_worker_deinitialize in case thread creation failed by @qedawkins in https://github.com/iree-org/iree/pull/22689 * [tuner][docs] update the example td spec in sharktuner readme by @bangtianliu in https://github.com/iree-org/iree/pull/22692 * Integrate LLVM at 21e0b56d7afc by @lialan in https://github.com/iree-org/iree/pull/22667 * Revert "[PJRT] Update rocm pjrt (#22317)" by @lialan in https://github.com/iree-org/iree/pull/22678 * [CI] Reduce ctest parallelism in the clang job by @kuhar in https://github.com/iree-org/iree/pull/22704 * [RISCV] Clean up toolchain CMake configuration by @HanKuanChen in https://github.com/iree-org/iree/pull/22663 * Integrate LLVM at c2b4e481a050 by @lialan in https://github.com/iree-org/iree/pull/22701 * Implementing initial end-to-end support for external transients. by @benvanik in https://github.com/iree-org/iree/pull/22625 * [Preprocessing] Add compute_barrier in ConvertConvFilterToChannelsLast pass by @yzhang93 in https://github.com/iree-org/iree/pull/22679 * [Codegen][GPU] Generalize linalg.reduce operations by @bangtianliu in https://github.com/iree-org/iree/pull/22490 * [CI] Update iree-org/iree-test-suites@17a391dc38 by @IanWood1 in https://github.com/iree-org/iree/pull/22698 * [Dispatch Creation] Add pass to fold reshapes into barriers by @IanWood1 in https://github.com/iree-org/iree/pull/22642 * Integrate llvm @ aa3f930931e6 by @lialan in https://github.com/iree-org/iree/pull/22713 * [Dispatch Creation] Don't fuse uses from above by @IanWood1 in https://github.com/iree-org/iree/pull/22708 * [DispatchCreation] Move RemoveTensorBarriers to end of pipeline by @IanWood1 in https://github.com/iree-org/iree/pull/22703 * [docs] Clarify code review process by @kuhar in https://github.com/iree-org/iree/pull/22714 * [docs] Fix a typo in code review process by @kuhar in https://github.com/iree-org/iree/pull/22716 * [DispatchCreation] Set split reduction size for ArgCompare by @bangtianliu in https://github.com/iree-org/iree/pull/22466 * [CI][TorchModels] Update SDXL int8 model CI (1/2) by @raayandhar in https://github.com/iree-org/iree/pull/22621 * [CI][TorchModels] Add data-tiling for Llama 8B Fp8 on gfx942 by @Abhishek-Varma in https://github.com/iree-org/iree/pull/22387 * [Build] Optionally use hip headers from system Hip package by @AaronStGeorge in https://github.com/iree-org/iree/pull/22715 * [Flow] Transfer globals per affinity instead of replicating by @sommerlukas in https://github.com/iree-org/iree/pull/22623 * Adding some Stream canonicalizations and RefineUsage improvements. by @benvanik in https://github.com/iree-org/iree/pull/22610 * [LDS] Reland "Lower to `coalesced_gather_dma` (#22294)" by @lialan in https://github.com/iree-org/iree/pull/22696 * [Codegen] Fold bitcast into bufferized tensor load by @Yu-Zhewen in https://github.com/iree-org/iree/pull/22672 * [DispatchCreation][NFC] Refactor split reduction helper methods to static functions by @bangtianliu in https://github.com/iree-org/iree/pull/22727 * [spirv] Handle 0d vectors during unrolling by @kuhar in https://github.com/iree-org/iree/pull/22730 * [LLVMGPU][Codegen] Emit packed chain FMA from select multi_reductions and contracts by @efric in https://github.com/iree-org/iree/pull/21855 * [Encoding] Add SerializableAttr interface to packed_storage by @sommerlukas in https://github.com/iree-org/iree/pull/22688 * Revert "[LLVMGPU][Codegen] Emit packed chain FMA from select multi_reductions and contracts" by @hanhanW in https://github.com/iree-org/iree/pull/22736 * [Codegen][GPU]Fixing barrier placement for 3+ stages pipelining by @jerryyin in https://github.com/iree-org/iree/pull/22725 * [Dispatch Creation] Add aggressive reshape movement flag by @IanWood1 in https://github.com/iree-org/iree/pull/22707 * Update CODEOWNERS to add more reviewers for GPU codegen pieces by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22721 * [CI][TorchModels] Update flags for CLIP test. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22413 * [TensorExt] Add Operations/Attributes/Interfaces for specifying ragged tensors. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/22267 * Bump actions/checkout from 5.0.1 to 6.0.0 in the github-actions group by @dependabot[bot] in https://github.com/iree-org/iree/pull/22742 * Fix incompatible pointer types for macOS build. by @hanhanW in https://github.com/iree-org/iree/pull/22738 * Integrate llvm/llvm-project@778e104d by @yzhang93 in https://github.com/iree-org/iree/pull/22741 * [Codegen] Test Cleanup 1/8: Common CPU tests by @qedawkins in https://github.com/iree-org/iree/pull/22744 * [CI] Bump golden value to 165*1.1=181.5 for prefill benchmark on mi325 by @hanhanW in https://github.com/iree-org/iree/pull/22752 * [Codegen] Test Cleanup 8/8: VMVX tests by @qedawkins in https://github.com/iree-org/iree/pull/22751 * [Codegen] Test Cleanup 4/8: Dialect tests by @qedawkins in https://github.com/iree-org/iree/pull/22747

New Contributors

Full Changelog: https://github.com/iree-org/iree/compare/v3.8.0...v3.9.0

Source: README.md, updated 2025-11-25