IREE Release v3.6.0
Compiler
- FissionTransferOpsInControlFlow pass for shared memory prefetching, improving padded convolution performance. (https://github.com/iree-org/iree/pull/21018)
- Refactored iree_gpu.multi_mma to iree_codegen.inner_tiled, enabling arbitrarily many operands and centralizing methods. (https://github.com/iree-org/iree/pull/21000, https://github.com/iree-org/iree/pull/21062)
- Support for distributing vector.constant_mask ops, aligning with existing mask behavior. (https://github.com/iree-org/iree/pull/20708)
- Added scaled MMA layout descriptor attribute for supporting scale operands in MMA ops. (https://github.com/iree-org/iree/pull/21141)
- Early bufferization ops support (store_to_buffer, load_from_buffer) in destination passing style conversion. (https://github.com/iree-org/iree/pull/21136)
- Intrinsic sorting for GPU MMA by key alignment and size preferences. (https://github.com/iree-org/iree/pull/21128)
- GPU intrinsic management simplified using new GPUIntrinsicType field. (https://github.com/iree-org/iree/pull/21103)
- New #iree_gpu.promote_with_cache_swizzle attribute to control operand promotion behavior. (https://github.com/iree-org/iree/pull/21105)
- New GPUApplyPaddingLevel pass and vectorization masking to reduce shared memory roundtrips. (https://github.com/iree-org/iree/pull/21074)
- Support for expand_shape through tensor.concat, enabling fusion with attention ops. (https://github.com/iree-org/iree/pull/21158)
-
ROCM ping-pong matmul support for BF16 (large/medium, expanded). (https://github.com/iree-org/iree/pull/21267)
-
Bug Fixes and Robustness Updates ( https://github.com/iree-org/iree/pull/21036, https://github.com/iree-org/iree/pull/21037, https://github.com/iree-org/iree/pull/20108, https://github.com/iree-org/iree/pull/21063, https://github.com/iree-org/iree/pull/21069, https://github.com/iree-org/iree/pull/21047, https://github.com/iree-org/iree/pull/21166, https://github.com/iree-org/iree/pull/21160, https://github.com/iree-org/iree/pull/21121, https://github.com/iree-org/iree/pull/21113, https://github.com/iree-org/iree/pull/21132, https://github.com/iree-org/iree/pull/21118, https://github.com/iree-org/iree/pull/21151, https://github.com/iree-org/iree/pull/21190, https://github.com/iree-org/iree/pull/21244, https://github.com/iree-org/iree/pull/21237, https://github.com/iree-org/iree/pull/21355, https://github.com/iree-org/iree/pull/21345, https://github.com/iree-org/iree/pull/21244,https://github.com/iree-org/iree/pull/21315, https://github.com/iree-org/iree/pull/21270, https://github.com/iree-org/iree/pull/21245, https://github.com/iree-org/iree/pull/21126, https://github.com/iree-org/iree/pull/20977, https://github.com/iree-org/iree/pull/21137, https://github.com/iree-org/iree/pull/21241, https://github.com/iree-org/iree/pull/21337, https://github.com/iree-org/iree/pull/21353, https://github.com/iree-org/iree/pull/21351, https://github.com/iree-org/iree/pull/21151, https://github.com/iree-org/iree/pull/21295, https://github.com/iree-org/iree/pull/21281, https://github.com/iree-org/iree/pull/21324)
-
Linalg Extension Improvements (https://github.com/iree-org/iree/pull/21021, https://github.com/iree-org/iree/pull/21138, https://github.com/iree-org/iree/pull/21090, https://github.com/iree-org/iree/pull/21106, https://github.com/iree-org/iree/pull/20263, https://github.com/iree-org/iree/pull/21220, https://github.com/iree-org/iree/pull/21217, https://github.com/iree-org/iree/pull/21116, https://github.com/iree-org/iree/pull/21338, https://github.com/iree-org/iree/pull/21316, https://github.com/iree-org/iree/pull/21309, https://github.com/iree-org/iree/pull/21189,https://github.com/iree-org/iree/pull/21249)
-
Enhanced Testing, Debugging and Documentation (https://github.com/iree-org/iree/pull/21145, https://github.com/iree-org/iree/pull/21143, https://github.com/iree-org/iree/pull/21242, https://github.com/iree-org/iree/pull/21273, https://github.com/iree-org/iree/pull/21229, https://github.com/iree-org/iree/pull/21374, https://github.com/iree-org/iree/pull/21368, https://github.com/iree-org/iree/pull/21335, https://github.com/iree-org/iree/pull/21280, https://github.com/iree-org/iree/pull/21374, https://github.com/iree-org/iree/pull/21324)
Runtime
- Added AMDGPU executable implementation with no-op cache, supporting verified, topology-wide loading and optimized kernel argument management for dispatches. (https://github.com/iree-org/iree/pull/21040)
- Enabled auto torch input conversion triggered by function argument and result types to streamline input handling. (https://github.com/iree-org/iree/pull/21067)
- Added rematerialize parallel ops support in the vector distribute pipeline to improve elementwise operation fusion. (https://github.com/iree-org/iree/pull/21073)
- Introduced skeleton AMDGPU buffer handle and handle pool with external and transient buffer types supporting async allocations and device pointer resolution. (https://github.com/iree-org/iree/pull/21044)
- Added support for group_any in iree_thread_affinity_t to assign threads to processor groups (e.g., NUMA nodes) instead of specific CPUs, aiding loosely coordinated thread pools. (https://github.com/iree-org/iree/pull/21089)
- Added _base variants for all string view integer parsing functions, aligning with standard C APIs, and cleaned up HIP driver integer parsing code. (https://github.com/iree-org/iree/pull/21086)
- Added iree_hal_amdgpu_system_t to manage shared HSA/topology/pools resources across physical devices in a logical device. (https://github.com/iree-org/iree/pull/21043)
- Added device-side AMDGPU signal and queue utility headers derived from HSA spec and ROCR implementation. (https://github.com/iree-org/iree/pull/21042)
- Implemented AMDGPU command buffer host-side and device-side, supporting recording, execution, and segmented command buffers with conditional branch groundwork. (https://github.com/iree-org/iree/pull/21123)
- Added device->host service worker to mimic HSA/AQL queue semantics for hosting device communication, enabling future tooling compatibility. (https://github.com/iree-org/iree/pull/21094)
- Added blit kernels and device-side enqueue support as initial implementations for copy operations, enabling CTS test passes. (https://github.com/iree-org/iree/pull/21057)
- Added device-side tracing macros and ringbuffer trace buffer, laying groundwork for on-device tracing interoperable with host tooling like Tracy. (https://github.com/iree-org/iree/pull/21046)
- Added AMDGPU semaphore allocation and pooling with host-side HAL support; device-side semaphore implementation and external semaphore imports are forthcoming. (https://github.com/iree-org/iree/pull/21201)
- Enhanced loop fission pass (FissionTransferOpsInControlFlow) to support loops containing multiple transfer_read/write pairs, improving IR simplification with additional pattern application. (https://github.com/iree-org/iree/pull/21213)
- Introduced IREE_ENABLE_RUNTIME_COVERAGE CMake mode to enable LLVM coverage for runtime libraries, test binaries, and tools, along with scripts to generate LCOV reports and IDE integration. (https://github.com/iree-org/iree/pull/21191)
- Added iree-hal-drivers-amdgpu-tests target to enable building all AMDGPU HAL tests together easily via IDE actions. (https://github.com/iree-org/iree/pull/21389)
- Implemented AMDGPU logical and physical devices with skeleton queues support, allowing multiple virtual queues per logical device and preparing for host- and device-side queue operations. (https://github.com/iree-org/iree/pull/21251)
- Fixes and Stability Enhancements: (https://github.com/iree-org/iree/pull/21056, https://github.com/iree-org/iree/pull/21060, https://github.com/iree-org/iree/pull/21061, https://github.com/iree-org/iree/pull/21153, https://github.com/iree-org/iree/pull/21200)
- Testing, Debuggability and Tooling: (https://github.com/iree-org/iree/pull/21046, https://github.com/iree-org/iree/pull/21191, https://github.com/iree-org/iree/pull/21389, https://github.com/iree-org/iree/pull/21094)
Change Log
Git History
## What's Changed * [Codegen][GPU] Creating FissionTransferOpsInControlFlow to assist convolution prefetching by @jerryyin in https://github.com/iree-org/iree/pull/21018 * [LLVMGPU] Add lowering strategy selection for map_scatter by @Max191 in https://github.com/iree-org/iree/pull/21034 * [LinalgExt] Add argmax op with rountrip and invalid mlir test by @bangtianliu in https://github.com/iree-org/iree/pull/21021 * Expose creation of FileHandles from FDs to python. by @AWoloszyn in https://github.com/iree-org/iree/pull/21016 * [Codegen] Generalize MultiMmaInterfaceAttr to InnerTileDescAttrInterface by @krzysz00 in https://github.com/iree-org/iree/pull/21000 * [Codegen] Fix FoldCollapseShapeIntoInterfaceTensorStoreFullSlice by @IanWood1 in https://github.com/iree-org/iree/pull/21036 * Adding `iree_hal_amdgpu_executable_t` implementation + no-op cache. by @benvanik in https://github.com/iree-org/iree/pull/21040 * [GPU] Handle transient private values in control-flow when prefetching by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21037 * Ensuring unique names for outlined `hal.dispatch.extern` ops. by @benvanik in https://github.com/iree-org/iree/pull/21055 * [NFC] simplify check in scf.if stage selection when prefetching by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21054 * Only consider executables with no external variants for linking. by @benvanik in https://github.com/iree-org/iree/pull/21056 * Translate flat operand index into segment relative index. by @benvanik in https://github.com/iree-org/iree/pull/21060 * [Dispatch] Only bubble reshapes when possibly blocking fusion by @IanWood1 in https://github.com/iree-org/iree/pull/20108 * [mlir][GPU] Make small reductions go down tile and fuse pipeline. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21063 * Bump llvm to llvm/llvm-project@80ea5f46df3e by @pashu123 in https://github.com/iree-org/iree/pull/21065 * Triggering auto torch input conversion based on func arg/result types. by @benvanik in https://github.com/iree-org/iree/pull/21067 * [Codegen] Add reshape map_scatter folding to BlockDynamicDimensions by @Max191 in https://github.com/iree-org/iree/pull/21047 * Add default option to only do loop fission for unit trip loops by @nirvedhmeshram in https://github.com/iree-org/iree/pull/21069 * [GPU] Add rematerialize parallel ops in the vector distribute pipeline by @pashu123 in https://github.com/iree-org/iree/pull/21073 * Bump version to 3.6.0 after 3.5.0 release. by @ScottTodd in https://github.com/iree-org/iree/pull/21078 * Guard HIP macro against redefinition by @erieaton-amd in https://github.com/iree-org/iree/pull/21061 * Bump llvm to llvm/llvm-project@0a6463039da89914c7a0f99622fb7a0 by @pashu123 in https://github.com/iree-org/iree/pull/21072 * Bump llvm to llvm/llvm-project@bc7ea63e9c885fbe71dec29581a206 by @pashu123 in https://github.com/iree-org/iree/pull/21083 * [build] Fix Bazel dependency in EncodingUtils for shared library builds by @AGindinson in https://github.com/iree-org/iree/pull/21085 * [LLVMGPU] Enable hip e2e tests for map_scatter by @Max191 in https://github.com/iree-org/iree/pull/21079 * Adding dummy AMDGPU channel/event. by @benvanik in https://github.com/iree-org/iree/pull/21041 * Adding device-side AMDGPU signal/queue utils. by @benvanik in https://github.com/iree-org/iree/pull/21042 * Adding iree_hal_amdgpu_system_t to manage HSA/topology/pools. by @benvanik in https://github.com/iree-org/iree/pull/21043 * Adding _base variants of all string view int parsing and clean up hip options. by @benvanik in https://github.com/iree-org/iree/pull/21086 * Adding support for group_any in iree_thread_affinity_t. by @benvanik in https://github.com/iree-org/iree/pull/21089 * Adding skeleton AMDGPU buffer handle and handle pool. by @benvanik in https://github.com/iree-org/iree/pull/21044 * Adding skeleton AMDGPU allocator. by @benvanik in https://github.com/iree-org/iree/pull/21093 * [LLVMGPU] Delete LLVMGPUPadAndVectorDistribute by @Groverkss in https://github.com/iree-org/iree/pull/21095 * [VectorDistribution] Add support for distributing vector.constant_mask by @Groverkss in https://github.com/iree-org/iree/pull/20708 * [Codegen] Generalize iree_gpu.multi_mma to iree_codegen.inner_tiled by @krzysz00 in https://github.com/iree-org/iree/pull/21062 * Don't erase the target executable in the loop using it. by @benvanik in https://github.com/iree-org/iree/pull/21097 * Implement PartitionableLoopsInterface for tensor.concat by @IanWood1 in https://github.com/iree-org/iree/pull/21082 * [LLVMGPU] Add relayout combination behind a flag by @Max191 in https://github.com/iree-org/iree/pull/21076 * [Encoding][LLVMGPU] Add encoding fusion e2e test by @Max191 in https://github.com/iree-org/iree/pull/21088 * Enable the linalg.mmt4d operation and add mmt4d microkernels for the riscv64 by @adeel10x in https://github.com/iree-org/iree/pull/20263 * [HAL] Refactor memory property attributes by @ziereis in https://github.com/iree-org/iree/pull/21005 * [runtime] Add riscv `pause` instruction for spinning by @NoumanAmir657 in https://github.com/iree-org/iree/pull/21075 * Reland "[Codegen][ROCDL] Drop nominal support for dynamic shared mem (#21020)" by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21102 * Switch experimental to false on windows release packages by @zeeshanhaque21 in https://github.com/iree-org/iree/pull/21104 * [Codegen] Port AMDGPU device lib implementations to MLIR rewrites by @keshavvinayak01 in https://github.com/iree-org/iree/pull/20598 * Bump dawidd6/action-download-artifact from 10 to 11 in the github-actions group by @dependabot[bot] in https://github.com/iree-org/iree/pull/21109 * Adding device-side tracing macros and a device-side trace buffer. by @benvanik in https://github.com/iree-org/iree/pull/21046 * Adding blit kernels and device-side enqueuing. by @benvanik in https://github.com/iree-org/iree/pull/21057 * Adding skeleton device->host service worker. by @benvanik in https://github.com/iree-org/iree/pull/21094 * Add padding, masking and fold vector.transfer_write -> vector.transfer_read to avoid memory roundtrips by @nicolasvasilache in https://github.com/iree-org/iree/pull/21074 * [Codegen][GPU] Move operand promotion control to attribute interface by @qedawkins in https://github.com/iree-org/iree/pull/21098 * [HIP] Emit error for non-zero dynamic shared memory by @qedawkins in https://github.com/iree-org/iree/pull/21118 * [Codegen][GPU] Add promotion attribute for setting cache swizzling by @qedawkins in https://github.com/iree-org/iree/pull/21105 * Force install python version 3.13.5 for windows by @jitesh-gupta in https://github.com/iree-org/iree/pull/21120 * Add tiling interface to `tensor.concat` by @IanWood1 in https://github.com/iree-org/iree/pull/21081 * [Codegen][GPU] Sort intrinsic according to k alignment - Step 1 of 2- Track MmaInterfaceAttr via field instead of index by @jerryyin in https://github.com/iree-org/iree/pull/21103 * [NFC] Extract common reshape patterns to dedicated file by @jtuyls in https://github.com/iree-org/iree/pull/21111 * Extract reshape into interface folding tests into dedicated file by @jtuyls in https://github.com/iree-org/iree/pull/21112 * [Flow] Fix crash when flow.return has no operands by @IanWood1 in https://github.com/iree-org/iree/pull/21132 * [GlobalOpt] Don't modify concat in dispatch by @IanWood1 in https://github.com/iree-org/iree/pull/21129 * Change linux arm64 runners to newly available github hosted runners by @jitesh-gupta in https://github.com/iree-org/iree/pull/21131 * [compiler] remove uses of memref::ExpandOps pass by @ftynse in https://github.com/iree-org/iree/pull/21113 * [Dispatch Creation] Improve extract_slice expand_shape bubbling by @IanWood1 in https://github.com/iree-org/iree/pull/21121 * [LinalgExt] fix arg_compare op with region and start index by @bangtianliu in https://github.com/iree-org/iree/pull/21106 * Integrate LLVM at 029f8892 by @bjacob in https://github.com/iree-org/iree/pull/21140 * [Flow] Improve reduction dispatch names by @IanWood1 in https://github.com/iree-org/iree/pull/21139 * [doc] Add tips of reading input from a file by @jinchen62 in https://github.com/iree-org/iree/pull/21143 * [doc] Fix tip render to mkdocs style by @jinchen62 in https://github.com/iree-org/iree/pull/21145 * Integrate LLVM at 836201f by @bjacob in https://github.com/iree-org/iree/pull/21148 * [Codegen][GPU] Sort intrinsic according to k alignment - Step 2 of 2 - Creating intrinsic sort routine by @jerryyin in https://github.com/iree-org/iree/pull/21128 * Integrate LLVM at 227f759644 by @bjacob in https://github.com/iree-org/iree/pull/21156 * Adding AMDGPU command buffer implementation. by @benvanik in https://github.com/iree-org/iree/pull/21123 * [VectorDistribute] Implement layout analysis for transfer_gather by @Groverkss in https://github.com/iree-org/iree/pull/21164 * Don't use dl_tensor.byte_offset when exporting capsules. by @AWoloszyn in https://github.com/iree-org/iree/pull/21153 * [LinalgExt] Add simple vectorization for map_scatter by @Max191 in https://github.com/iree-org/iree/pull/21090 * [Codegen] Simplify tensor load/store padding materialization by @jtuyls in https://github.com/iree-org/iree/pull/21160 * [Util] Add folder for assumes of X / C * C by @qedawkins in https://github.com/iree-org/iree/pull/21168 * Integrate LLVM at c5b256a0e480 by @lialan in https://github.com/iree-org/iree/pull/21162 * CMake: catch some recurring problems with LLVM configuration. by @bjacob in https://github.com/iree-org/iree/pull/21174 * [Encoding] Rename testing purpose encodings to follow the convention. by @hanhanW in https://github.com/iree-org/iree/pull/21144 * [VectorDistribution] Add pattern to distribute transfer_gather ops by @Groverkss in https://github.com/iree-org/iree/pull/20764 * [Codegen] Support early bufferization ops in ConvertToDPS by @Max191 in https://github.com/iree-org/iree/pull/21136 * [Codegen][GPU] Prevent vector transfer fission from applying on loops with side-effecting ops by @rkayaith in https://github.com/iree-org/iree/pull/21166 * [Encoding] Refresh practical encodings to follow the naming convention. by @hanhanW in https://github.com/iree-org/iree/pull/21146 * [ROCMTarget] Add pass for applying builtin specialization patterns by @qedawkins in https://github.com/iree-org/iree/pull/21001 * [Encoding][NFC] Improve the docs for Encoding dialect. by @hanhanW in https://github.com/iree-org/iree/pull/21147 * Expand all affine applies before and during Flow by @qedawkins in https://github.com/iree-org/iree/pull/21169 * [LinalgExt] support converting argcompare to loops. by @bangtianliu in https://github.com/iree-org/iree/pull/21138 * [CPU] Add option to `LLVMCPUTileRootAndFuseProducerConsumer` to tiling with `scf.forall` by @AaronStGeorge in https://github.com/iree-org/iree/pull/21009 * [Codegen][GPU] Add a inner tiled op descriptor for scaled MMA by @krzysz00 in https://github.com/iree-org/iree/pull/21141 * [Codegen][GPU] Generalize ConcretizeMmaShapes to arbitrary inner tiles by @krzysz00 in https://github.com/iree-org/iree/pull/21142 * Re-enable e2e pack.mlir tests for RISC-V targets. by @hanhanW in https://github.com/iree-org/iree/pull/21179 * [Codegen][Tuner] expose python binding for attention op details by @bangtianliu in https://github.com/iree-org/iree/pull/21170 * [LLVMGPU] Re-run alloc hoisting after SCFToControlFlow by @rkayaith in https://github.com/iree-org/iree/pull/21193 * Expand on commit access policies. by @ScottTodd in https://github.com/iree-org/iree/pull/21205 * [DispatchCreation] Don't pad on attention in producer dispatch by @jtuyls in https://github.com/iree-org/iree/pull/21134 * [CPU] Use option to tile with `scf.forall` in TileRootAndFuseProducerConsumer pass by @AaronStGeorge in https://github.com/iree-org/iree/pull/21198 * Pinning ninja on the Windows CI to 1.12.1 due to a 1.13.0 bug. by @benvanik in https://github.com/iree-org/iree/pull/21208 * [LinalgExt] add TilingInterface support for ArgCompareOp by @bangtianliu in https://github.com/iree-org/iree/pull/21077 * Fixing typo in [#21142] that was causing failures on MSVC. by @benvanik in https://github.com/iree-org/iree/pull/21211 * Fixing bool->iree_status_t cast error. by @benvanik in https://github.com/iree-org/iree/pull/21212 * Bump LLVM to [4ac472] by @nicolasvasilache in https://github.com/iree-org/iree/pull/21175 * [Codegen] Fix undefined behavior in InnerTileOp expansion by @jtuyls in https://github.com/iree-org/iree/pull/21218 * [Codegen][GPU] Move LLVMGPUPrefetching pass to be invoked from only amdgpu backend by @jerryyin in https://github.com/iree-org/iree/pull/21190 * [NFC] remove redundant checks in the TilingInterface by @bangtianliu in https://github.com/iree-org/iree/pull/21225 * Integrate LLVM @ c73e5e3e209c by @lialan in https://github.com/iree-org/iree/pull/21224 * [LinalgExt] add e2e tests for argcompare op by @bangtianliu in https://github.com/iree-org/iree/pull/21217 * Revert "Force install python version 3.13.5 for windows" by @saienduri in https://github.com/iree-org/iree/pull/21215 * [Codegen] Fix multiple function support in materialize user configs by @qedawkins in https://github.com/iree-org/iree/pull/21227 * Fix missing return in `DeviceOptimalAttr::joinOR` by @rkayaith in https://github.com/iree-org/iree/pull/21228 * [LinalgExt] Implement Unit Dim folding for slice dimensions by @Groverkss in https://github.com/iree-org/iree/pull/21220 * [Codegen][GPU] Adding scheduling barrier between compute and write stage in prefetcher by @jerryyin in https://github.com/iree-org/iree/pull/21151 * Adding IREE_ENABLE_RUNTIME_COVERAGE cmake mode. by @benvanik in https://github.com/iree-org/iree/pull/21191 * [Codegen][GPU] Support fission of loops with multiple transfer_reads/writes by @rkayaith in https://github.com/iree-org/iree/pull/21213 * Extending AMDGPU tests, fixing issues, and cleaning up comments. by @benvanik in https://github.com/iree-org/iree/pull/21200 * [Integrate] Drop revert for vectorization API change by @Max191 in https://github.com/iree-org/iree/pull/21239 * [Codegen][Tuner] expose python binding isa_attention_op by @bangtianliu in https://github.com/iree-org/iree/pull/21216 * [Codegen] Fix TileLargeTensors handling of dynamic reduction dims by @qedawkins in https://github.com/iree-org/iree/pull/21244 * [DispatchCreation] Add pass to hoist scalar ops out of dispatch regions by @qedawkins in https://github.com/iree-org/iree/pull/21210 * Adding AMDGPU semaphore (WIP) and semaphore pool. by @benvanik in https://github.com/iree-org/iree/pull/21201 * Integrate LLVM to llvm/llvm-project@a99fee69 by @yzhang93 in https://github.com/iree-org/iree/pull/21242 * [Dispatch Creation] Add concat expand_shape bubbling by @dan-garvey in https://github.com/iree-org/iree/pull/21158 * [ROCm] Fix typo in R9700 SKU definition. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21247 * [ROCMTarget] Make all pingpong arithmetic nsw and nuw by @qedawkins in https://github.com/iree-org/iree/pull/21248 * [Integrate] Drop the revert of unknown type conversion in bufferization. by @hanhanW in https://github.com/iree-org/iree/pull/21243 * [Codegen] Add pass to propagate constant offsets towards accesses by @qedawkins in https://github.com/iree-org/iree/pull/21236 * [CodeGen] Re-enable memref::AssumeAlignmentOp for SPIRV pipelines. by @hanhanW in https://github.com/iree-org/iree/pull/21133 * [Integrate] Update bufferization related codes for upstream custom types support. by @hanhanW in https://github.com/iree-org/iree/pull/21250 * [Codegen] Change swizzle hint offset logic to use arith by @qedawkins in https://github.com/iree-org/iree/pull/21237 * [NFC] Make internal LLVMGPU APIs for vector_distribute available by @nicolasvasilache in https://github.com/iree-org/iree/pull/21161 * [VectorExt] Implement BufferizationInterface for transfer_gather by @Groverkss in https://github.com/iree-org/iree/pull/21219 * [VectorExt] Implement masked vectorization for iree_linalg_ext.gather by @Groverkss in https://github.com/iree-org/iree/pull/21189 * [CodeGen] Fix gather fusion on vector distribute path by @pashu123 in https://github.com/iree-org/iree/pull/21117 * [StableHLO] Fix ArrayRef(std::nullopt) deprecation warnings by @qedawkins in https://github.com/iree-org/iree/pull/21257 * [mlir][DispatchCreation] Avoid SSA violation due to consumer fusion while forming dispatches by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21186 * [CPU] Use scf.forall for TileRootAndFuseProducerConsumer by default. by @hanhanW in https://github.com/iree-org/iree/pull/21260 * Bump ncipollo/release-action from 1.16.0 to 1.18.0 in the github-actions group by @dependabot[bot] in https://github.com/iree-org/iree/pull/21254 * [Encoding] Add new identity encoding attribute by @jtuyls in https://github.com/iree-org/iree/pull/21258 * [mlir][Codegen] Remove workaround for handling consumer fusion along multiple operands. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21171 * [DT] fixup(MaterializeEncodingPatterns) remove legacy type conversions by @egebeysel in https://github.com/iree-org/iree/pull/21262 * Integrate LLVM to llvm/llvm-project@5ed852f7 by @yzhang93 in https://github.com/iree-org/iree/pull/21263 * Add link to the LLVM Social Bangalore talk by @pashu123 in https://github.com/iree-org/iree/pull/21265 * [Codegen] Fix specialize exports never applies check by @qedawkins in https://github.com/iree-org/iree/pull/21270 * [integrate|compiler] Drop carried LLVM reverts and use `ub.poison` in some transfer reads by @fabianmcg in https://github.com/iree-org/iree/pull/21259 * [ROCM] Ping pong matmul Bf16 matcher by @sebvince in https://github.com/iree-org/iree/pull/21267 * Integrate LLVM to llvm/llvm-project@e3edc1bd by @yzhang93 in https://github.com/iree-org/iree/pull/21272 * [Util] Fix assume.int operand deduplication canonicalizer by @qedawkins in https://github.com/iree-org/iree/pull/21273 * [Codegen] Fix lhs/rhs batch offsets size in vector contract distribution by @jtuyls in https://github.com/iree-org/iree/pull/21238 * Pad OnlineAttention by @nicolasvasilache in https://github.com/iree-org/iree/pull/21152 * [Codegen] Generalize ukernel strided_outer_dims and fix GPU ukernel bug by @Max191 in https://github.com/iree-org/iree/pull/21249 * [Codegen] Support inner_tiled and load_from_buffer in ConvertAccGEMMToGEMMPass by @Max191 in https://github.com/iree-org/iree/pull/21245 * [LinalgExt] Improve scatter unit dim folding by @IanWood1 in https://github.com/iree-org/iree/pull/21271 * [Codegen] Support dynamic dimensions in collapse_shape into interface store folding by @jtuyls in https://github.com/iree-org/iree/pull/21126 * [Codegen] Don't fold workgroup loops during workgroup tiling by @Max191 in https://github.com/iree-org/iree/pull/21137 * [DispatchCreation] Fold collapse(expand) unit dims by @IanWood1 in https://github.com/iree-org/iree/pull/21274 * [VectorDistribute][NFC] Refactor subgroup reduction distribution by @Groverkss in https://github.com/iree-org/iree/pull/21305 * [DT] add control function to FoldIntoPackUnpackPatterns by @egebeysel in https://github.com/iree-org/iree/pull/21276 * [CPU] Disable lowering_config propagation for Mmt4dTilingExpert pipeline by @hanhanW in https://github.com/iree-org/iree/pull/21298 * [Preprocessing][NFC] Drop dependency workaround from TransposeMatmulPass. by @hanhanW in https://github.com/iree-org/iree/pull/21296 * [CodeGen] Add a pass that patches func ops for debugging purpose. by @hanhanW in https://github.com/iree-org/iree/pull/21229 * [Integrate] Integrate llvm-project @0f391d6f51217de5cb6735b17f359eb078bbe94e by @Max191 in https://github.com/iree-org/iree/pull/21302 * [DataTiling] Enable layout transformation combination in GPU DT e2e tests by @Max191 in https://github.com/iree-org/iree/pull/21163 * [DT] Improve encoding hoisting pass. by @hanhanW in https://github.com/iree-org/iree/pull/21275 * [LinalgExt] Add an `DeviceMappingInterfaceAttribute` to tag split-k loops by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21309 * [CodeGen][NFC] Delete DeadMemAlloc patterns. by @hanhanW in https://github.com/iree-org/iree/pull/21310 * [Codegen][AMDGPU] Allow vector distribute configuration selection to handle `scf.forall` from split-reduction. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21281 * [Codegen][Common] Teach `VerifyWorkgroupDistribution` to allow `scf.forall` generated by split reduction. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21282 * Revert "[Codegen] Don't fold workgroup loops during workgroup tiling (#21137)" by @Max191 in https://github.com/iree-org/iree/pull/21318 * [Codegen] Bubble up/down reshape operations before blocking dynamic dimensions by @jtuyls in https://github.com/iree-org/iree/pull/21241 * [docs] Add 2025 AsiaLLVM talk about data-tiling from Hanhan. by @hanhanW in https://github.com/iree-org/iree/pull/21308 * [CPU] Introduce dictionary-based lowering_config attribute. by @hanhanW in https://github.com/iree-org/iree/pull/21312 * [TensorExt] Add new operation that is a placeholder for modifying number of workgroups for split reduction. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21314 * [docs] Correct default values in optimization options. by @hanhanW in https://github.com/iree-org/iree/pull/21321 * [CodeGen][NFC] Retire native_vector_sizes from LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21322 * [Codegen] Add StoreToBufferOp to vector distribution dispatch check by @jtuyls in https://github.com/iree-org/iree/pull/21294 * [CodeGen][NFC] Simplify logging with LDBG for Utils.cpp. by @hanhanW in https://github.com/iree-org/iree/pull/21328 * [LinalgExt] Improve Attention partial tiling new batch dimension insertion by @Groverkss in https://github.com/iree-org/iree/pull/21316 * [CPU] Implement OpAsmDialectInterface for IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21325 * [CodeGen] Make TilingConfig compatible with LoweringConfigAttrInterface. by @hanhanW in https://github.com/iree-org/iree/pull/21323 * [CPU] Teach TilingConfig about IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21327 * [GPU] Remove ROCm llvm plugin by @efric in https://github.com/iree-org/iree/pull/21311 * [CPU] Switch mmt4d pipeline to use IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21326 * [Dispatch Creation] Make producer fusable via interchange by @IanWood1 in https://github.com/iree-org/iree/pull/20977 * [Codegen] Add AMDGPU specific narrow type emulation pass for AMDGPUDialect by @qedawkins in https://github.com/iree-org/iree/pull/21333 * [TensorExt] Mirror a tensor_ext version of flow.tensor.bitcast by @qedawkins in https://github.com/iree-org/iree/pull/21277 * Move all producing uses of flow.bitcast to tensor_ext by @qedawkins in https://github.com/iree-org/iree/pull/21279 * Cleanup uses of --verify-diagnostics in tests by @qedawkins in https://github.com/iree-org/iree/pull/21340 * [Integrate] Bump LLVM to [77914c] by @Groverkss in https://github.com/iree-org/iree/pull/21341 * [VectorDistribute] Fix buffer reduction during subgroup reduction by @Groverkss in https://github.com/iree-org/iree/pull/21315 * [CPU] Use TilingConfig for lowering_config propagation. by @hanhanW in https://github.com/iree-org/iree/pull/21336 * [CPU] Implement lowering_config propagation for IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21337 * [Codegen][Tuner] remove decomposition attr for attention op by @bangtianliu in https://github.com/iree-org/iree/pull/21345 * [CPU] Switch all LinalgExt dispatches to root-based tiling pipeline. by @hanhanW in https://github.com/iree-org/iree/pull/21338 * [GlobalOpt] Delete experimental FuseSiluHorizontalMatmul pass. by @hanhanW in https://github.com/iree-org/iree/pull/21350 * [Codegen] Resolve scf.forall operations created during split-reduction by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21324 * [Codegen][GPU] Fuse nested warp and lane foralls by @Max191 in https://github.com/iree-org/iree/pull/21295 * [LinalgExt] Add decomposition for vector map_scatter by @Max191 in https://github.com/iree-org/iree/pull/21116 * [DispatchCreation] Add pass to cast away unsupported element types by @qedawkins in https://github.com/iree-org/iree/pull/21339 * [CPU] Improve TileRootAndFuseProducerConsumer like TileAndFuse pass. by @hanhanW in https://github.com/iree-org/iree/pull/21351 * [CPU] Refactor logic to LoweringConfigGenerator. by @hanhanW in https://github.com/iree-org/iree/pull/21352 * [LLVMGPU] Add canonicalization for select(pred, true, false) -> broadcast(pred) by @Groverkss in https://github.com/iree-org/iree/pull/21342 * [VectorExt] Use ub.poison for padding in vector_ext vectorization by @Groverkss in https://github.com/iree-org/iree/pull/21362 * [Codegen] Fix 1x1 Conv2D to Matmul pass ordering by @HalfBloodPrince010 in https://github.com/iree-org/iree/pull/21355 * Fix unsupported bitcasting of complex operands by @qedawkins in https://github.com/iree-org/iree/pull/21367 * [CPU][NFCI] Update tile sizes selection workaround if it has dynamic shape. by @hanhanW in https://github.com/iree-org/iree/pull/21353 * [Codegen] Combine layout transformation after GPUFuseAndHoistParallelLoops by @YashDeshpande25 in https://github.com/iree-org/iree/pull/21206 * [Stream] New ElideAsyncTransfersPass by @ziereis in https://github.com/iree-org/iree/pull/21029 * Fix failure in Windows build. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21369 * [DispatchCreation] Add a pass to split long running reduction loops. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21280 * Integrate LLVM to llvm/llvm-project@3ed3a33 by @bangtianliu in https://github.com/iree-org/iree/pull/21364 * [NFC] Fix smallvector size in kernel configattrs by @efric in https://github.com/iree-org/iree/pull/21348 * Integrate LLVM to llvm/llvm-project@bda5602 by @bangtianliu in https://github.com/iree-org/iree/pull/21377 * [CPU] Switch convolution pipelines to IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21347 * [CPU] Use IREE::CPU::TilingLevel in TileRootAndFuseProducerConsumer pass by @hanhanW in https://github.com/iree-org/iree/pull/21370 * [Stream] Fix dominance error for multi-result dispatches by @IanWood1 in https://github.com/iree-org/iree/pull/21368 * [codegen][gpu] Add the `iree-rocdl-use-buffer-instructions` pass by @fabianmcg in https://github.com/iree-org/iree/pull/21335 * [DispatchCreation] Avoid hoisting set encodings on scalar tensors by @jtuyls in https://github.com/iree-org/iree/pull/21376 * [Codegen] Distribute workgroups along X dim by @Max191 in https://github.com/iree-org/iree/pull/21334 * Add e2e test for split reduction using tiling. by @MaheshRavishankar in https://github.com/iree-org/iree/pull/21374 * [Dispatch Creation] Remove assert and handle null map by @IanWood1 in https://github.com/iree-org/iree/pull/21380 * [e2e] Adding default tuning specs tests. by @lialan in https://github.com/iree-org/iree/pull/21383 * Use `ShapedType::isStatic`. NFC. by @kuhar in https://github.com/iree-org/iree/pull/21385 * Reapply "[Codegen] Don't fold workgroup loops during workgroup tiling (#21137)" by @Max191 in https://github.com/iree-org/iree/pull/21382 * Implementing AMDGPU logical/physical devices and skeleton queues. by @benvanik in https://github.com/iree-org/iree/pull/21251 * Integrate LLVM to llvm/llvm-project@d9190f8 by @bangtianliu in https://github.com/iree-org/iree/pull/21388 * Adding `iree-hal-drivers-amdgpu-tests` target. by @benvanik in https://github.com/iree-org/iree/pull/21389 * [CPU] Switch pack/unpack disptaches to use IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21392 * [CPU] Teach SplitReduction about IREE::CPU::LoweringConfigAttr. by @hanhanW in https://github.com/iree-org/iree/pull/21391 * Add linalg.softmax e2e tests. by @hanhanW in https://github.com/iree-org/iree/pull/21396 * [Codegen][GPU] Canonicalize to remove the empty extract slice in combineLayoutTransformation pass by @jerryyin in https://github.com/iree-org/iree/pull/21395 * [Attention] Use multiple subgroups for memory bound attention by @Groverkss in https://github.com/iree-org/iree/pull/21363 * [CPU] Teach TilingConfig::getVectorTileSizes about CPU lowering config. by @hanhanW in https://github.com/iree-org/iree/pull/21397 * [CPU] Get rootOp based on lowering config in TileRootAndFuseProducerConsumer pass. by @hanhanW in https://github.com/iree-org/iree/pull/21394 * Integrate LLVM to llvm/llvm-project@e0cce5c by @bangtianliu in https://github.com/iree-org/iree/pull/21398 * [WIP] Expose multi-use fusion flag to pipeline options. by @IanWood1 in https://github.com/iree-org/iree/pull/21400 * [CPU][NFC] Switch infusible pack tests to use imperfect tiling case. by @hanhanW in https://github.com/iree-org/iree/pull/21404
New Contributors
- @adeel10x made their first contribution in https://github.com/iree-org/iree/pull/20263
- @NoumanAmir657 made their first contribution in https://github.com/iree-org/iree/pull/21075
- @zeeshanhaque21 made their first contribution in https://github.com/iree-org/iree/pull/21104
- @keshavvinayak01 made their first contribution in https://github.com/iree-org/iree/pull/20598
- @jitesh-gupta made their first contribution in https://github.com/iree-org/iree/pull/21120
- @sebvince made their first contribution in https://github.com/iree-org/iree/pull/21267
- @efric made their first contribution in https://github.com/iree-org/iree/pull/21311
- @HalfBloodPrince010 made their first contribution in https://github.com/iree-org/iree/pull/21355
Full Changelog: https://github.com/iree-org/iree/compare/v3.5.0...v3.6.0