Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Halide-21.0.0-x86-64-windows-b629c80de18f1534ec71fddd8b567aa7027a0876.zip | 2025-09-17 | 70.9 MB | |
Halide-21.0.0-x86-64-osx-b629c80de18f1534ec71fddd8b567aa7027a0876.tar.gz | 2025-09-17 | 166.6 MB | |
Halide-21.0.0-x86-64-linux-b629c80de18f1534ec71fddd8b567aa7027a0876.tar.gz | 2025-09-17 | 183.4 MB | |
Halide-21.0.0-x86-32-windows-b629c80de18f1534ec71fddd8b567aa7027a0876.zip | 2025-09-17 | 65.3 MB | |
Halide-21.0.0-x86-32-linux-b629c80de18f1534ec71fddd8b567aa7027a0876.tar.gz | 2025-09-17 | 194.5 MB | |
Halide-21.0.0-arm-64-osx-b629c80de18f1534ec71fddd8b567aa7027a0876.tar.gz | 2025-09-17 | 168.6 MB | |
Halide-21.0.0-arm-64-linux-b629c80de18f1534ec71fddd8b567aa7027a0876.tar.gz | 2025-09-17 | 183.6 MB | |
Halide-21.0.0-arm-32-linux-b629c80de18f1534ec71fddd8b567aa7027a0876.tar.gz | 2025-09-17 | 182.2 MB | |
README.md | 2025-09-16 | 16.7 kB | |
v21.0.0 source code.tar.gz | 2025-09-16 | 33.3 MB | |
v21.0.0 source code.zip | 2025-09-16 | 34.3 MB | |
Totals: 11 Items | 1.3 GB | 2 |
Release highlights
We have deliberately skipped version 20.0.0 to align with the LLVM version we are now using. Note that LLVM 21.1.1 or higher is required as LLVM 21.1.0 has a major bug in the NVPTX backend.
Major changes
- The
rfactor
scheduling directive was rewritten and enhanced. It is now compatible with autoschedulers. - The Mullapudi2016 autoscheduler now supports experimental GPU scheduling.
- The Python bindings have been substantially improved, with many missing bindings filled in.
HL_DEBUG_CODEGEN
gained a new filtering mode. Debug levels can now be set on a per-file/per-function basis.- Support was added for AMD Zen5 and the iOS Simulator.
- The
strict_float
feature has been reimplemented and should be much more reliable. - Lots of bugfixes, performance improvements, and build system improvements. We spent a lot of time fixing issues with our testing infrastructure and are looking forward to implementing a more stable contribution experience going forward.
Deprecations
- LLVM 19 and below are no longer supported, in keeping with our support policy.
Halide_BUNDLE_STATIC
will be removed in the next release. If you are using it, please migrate to the shared library instead.- Support for Python 3.8 has been dropped.
Changelog
Scheduling
- The
rfactor
scheduling directive was rewritten and enhanced. - Rewrite the rfactor scheduling directive by @alexreinking in https://github.com/halide/Halide/pull/8490
- Dequalify names when constructing RVars in rfactor by @alexreinking in https://github.com/halide/Halide/pull/8560
- Add promise_clamped in rfactor by @alexreinking in https://github.com/halide/Halide/pull/8608
- Add rfactor patterns for NaN-propagating min/max by @alexreinking in https://github.com/halide/Halide/pull/8587
- The Mullapudi2016 autoscheduler now supports experimental GPU scheduling.
- GPU autoscheduling with Mullapudi2016: the reference implementation by @antonysigma in https://github.com/halide/Halide/pull/7787
- Mullapudi2016-GPU: Reorder to avoid for-loops to be sandwiched between
gpu_blocks
. by @antonysigma in https://github.com/halide/Halide/pull/8647 - Enable experimental Mullapudi2016 GPU scheduler for test-bench by @antonysigma in https://github.com/halide/Halide/pull/8650
- Highlight Metal GPU code in stmt_html by @antonysigma in https://github.com/halide/Halide/pull/8659
- Always ensure gpu_threads count >= warp size of 32 by @antonysigma in https://github.com/halide/Halide/pull/8656
- Fix incorrect natural vector size on Zen4 by @abadams in https://github.com/halide/Halide/pull/8570
- Make it an error to use a device extern stage without target support by @abadams in https://github.com/halide/Halide/pull/8794
- Add support for adding tuple outputs in the configure() method by @abadams in https://github.com/halide/Halide/pull/8649
Python
- Fix argument order in rpow by @alexreinking in https://github.com/halide/Halide/pull/8677
- Drop support for Python 3.8 by @alexreinking in https://github.com/halide/Halide/pull/8678
- Fix segfault in RDom's operator<< by @alexreinking in https://github.com/halide/Halide/pull/8679
- Use ruff to format and lint Python code by @alexreinking in https://github.com/halide/Halide/pull/8684
- Get raw Runtime::Buffer from Buffer in Python rather than use PyBuffer by @alexreinking in https://github.com/halide/Halide/pull/8682
- Bind in-place update operators (e.g. +=) in Python by @alexreinking in https://github.com/halide/Halide/pull/8683
- Clean up Python dependencies; document uv usage by @alexreinking in https://github.com/halide/Halide/pull/8694
- Fix several printing segfaults. by @alexreinking in https://github.com/halide/Halide/pull/8700
- Add Python bindings for serialization by @alexreinking in https://github.com/halide/Halide/pull/8718
- Add all remaining IROperator ops to Python bindings by @alexreinking in https://github.com/halide/Halide/pull/8771
- Fix up memoize; bind to Python by @alexreinking in https://github.com/halide/Halide/pull/8778
- Fix invalid Python type annotation and return types (#8772) by @rtzam in https://github.com/halide/Halide/pull/8773
- Expose
Runtime::Buffer::cropped
to C++ and PythonBuffer
by @rtzam in https://github.com/halide/Halide/pull/8787
Debugging
- New feature flag to allow for stack backtrace/unwind by @mcourteaux in https://github.com/halide/Halide/pull/8703
- Add filtering capabilities to HL_DEBUG_CODEGEN by @alexreinking in https://github.com/halide/Halide/pull/8627
- Adding worker_thread_idle() for more informative profiling by @slomp in https://github.com/halide/Halide/pull/8719
- Color IR output in cout and cerr. by @mcourteaux in https://github.com/halide/Halide/pull/8635
- Improve output format for lowering passes timing. by @mcourteaux in https://github.com/halide/Halide/pull/8749
- fix(stmt-html): Fix embedded Buffer processing performance issue. by @mcourteaux in https://github.com/halide/Halide/pull/8748
- Use AArch64 assembly syntax on macOS with LLVM<22 by @alexreinking in https://github.com/halide/Halide/pull/8710
CodeGen
- Mark our PTX kernels as kernels, to stop them from being stripped by @abadams in https://github.com/halide/Halide/pull/8571
- Math functions renaming table for GPU backends to support vectorized evaluation of math functions. by @mcourteaux in https://github.com/halide/Halide/pull/8595
- Apply version constraints to iOS objects by @alexreinking in https://github.com/halide/Halide/pull/8546
- Redirect bitwise ops to logical ops in case the arguments are bool. by @mcourteaux in https://github.com/halide/Halide/pull/8597
- scalarize select condition for LLVM where possible by @abadams in https://github.com/halide/Halide/pull/8575
- Add missing addition simplifier rules by @abadams in https://github.com/halide/Halide/pull/8630
- Bounds and alignment analysis through bitwise ops by @abadams in https://github.com/halide/Halide/pull/8574
- Make the vld2 pattern more obviously profitable by @abadams in https://github.com/halide/Halide/pull/8765
- Fix vector shuffle for Vulkan CodeGen by @derek-gerstmann in https://github.com/halide/Halide/pull/8621
- Suppress warning on Windows for duplicate constant symbols. by @mcourteaux in https://github.com/halide/Halide/pull/8555
- Use lossless_cast for saturating casts from unsigned to signed on x86 by @abadams in https://github.com/halide/Halide/pull/8527
- AMD Zen5 support by @changhoon-sung in https://github.com/halide/Halide/pull/8612
Compiler
- Rework strict_float to use individual op intrinsics instead by @abadams in https://github.com/halide/Halide/pull/8641
- Don't cache mutations of Exprs that have only one reference to them by @abadams in https://github.com/halide/Halide/pull/8518
- Only use the nodes-visited set for nodes with multiple refs by @abadams in https://github.com/halide/Halide/pull/8547
- In graph_equal(), call the correct implementation for comparing equalities between statements and expressions by @BachiLi in https://github.com/halide/Halide/pull/8611
Runtime
- Support copying the overlapping region from one buffer to another. by @mcourteaux in https://github.com/halide/Halide/pull/8463
- Add (iOS) simulator target feature. by @alexreinking in https://github.com/halide/Halide/pull/8623
- Opt out of JIT exceptions by @abadams in https://github.com/halide/Halide/pull/8615
- Experimental: support removing unused runtime functions via
HL_RUNTIME_DROP_FUNCS
environment variable. - PoC feature: drop functions from the runtime by @mcourteaux in https://github.com/halide/Halide/pull/8653
Apps
- The onnx app now builds with CMake:
- Add CMake for onnx app by @vawale in https://github.com/halide/Halide/pull/8707
- Fix halide_as_onnx_backend_test by @alexreinking in https://github.com/halide/Halide/pull/8784
Documentation
- Add note about relative paths to readme by @abadams in https://github.com/halide/Halide/pull/8613
Bugfixes
- Fix [#8534] [Buffer serialization does not match deserialization] by @abadams in https://github.com/halide/Halide/pull/8535
- Fix CUDA HTML code printing bug. by @mcourteaux in https://github.com/halide/Halide/pull/8558
- Fix halide_get_cpu_features() linkage to avoid name mangling issues by @derek-gerstmann in https://github.com/halide/Halide/pull/8573
- Fix for [#8578] by @mcourteaux in https://github.com/halide/Halide/pull/8579
- Fix shuffle bug in CodeGen C. by @mcourteaux in https://github.com/halide/Halide/pull/8567
- Check if expression is defined before trying to compute its constant_integer_bounds by @vksnk in https://github.com/halide/Halide/pull/8599
- Drop invalid "in-bounds" GEP for constant offsets by @alexreinking in https://github.com/halide/Halide/pull/8768
- Record trace_loads directly on ImageParam. by @alexreinking in https://github.com/halide/Halide/pull/8803
- RewriteLoadsAs32Bit should use the mutated index by @rootjalex in https://github.com/halide/Halide/pull/8581
- Set any_strict_float for wrapper module if target has strict_flag feature by @vksnk in https://github.com/halide/Halide/pull/8681
- Fix wrong type of the bound by @vksnk in https://github.com/halide/Halide/pull/8781
- Fix UB-introducing rewrite in FindIntrinsics by @abadams in https://github.com/halide/Halide/pull/8539
- Fix rewrite that doesn't preserve type by @abadams in https://github.com/halide/Halide/pull/8674
- Fix nested select handling in remove_undef by @abadams in https://github.com/halide/Halide/pull/8669
- Add an underlying type to the halide_buffer_flags to prevent UB in C++ by @mcourteaux in https://github.com/halide/Halide/pull/8690
Testing / CI
- Limit depth more strictly in CSE fuzz test by @abadams in https://github.com/halide/Halide/pull/8512
- Skip fast exp/log/pow/sin/cosine tests without sse 4.1 by @abadams in https://github.com/halide/Halide/pull/8541
- Hopefully fix flaky mullapudi reorder test by @abadams in https://github.com/halide/Halide/pull/8542
- Skip test when code could be using x87 by @abadams in https://github.com/halide/Halide/pull/8537
- Fix stale GPU lifetime management tests for Vulkan. by @derek-gerstmann in https://github.com/halide/Halide/pull/8601
- Upgrade runner for cmake_cmake_file_lists job by @alexreinking in https://github.com/halide/Halide/pull/8609
- Buildbot fixes by @alexreinking in https://github.com/halide/Halide/pull/8706
- Fix the pip packaging workflow by @alexreinking in https://github.com/halide/Halide/pull/8708
- Fix complexity of bounds of nested pure intrinsics by @abadams in https://github.com/halide/Halide/pull/8689
- Skip two sub-tests on llvm 21.1 by @abadams in https://github.com/halide/Halide/pull/8782
- Speed up simd_op_check_wasm by @abadams in https://github.com/halide/Halide/pull/8780
- Reduce the beam size in the adams2019 apps test to avoid timeouts by @abadams in https://github.com/halide/Halide/pull/8786
- Workaround llvm slow compile time bug in Mullapudi overlap test by @abadams in https://github.com/halide/Halide/pull/8793
- Restore concurrent behavior to gpu_allocation_cache test by @abadams in https://github.com/halide/Halide/pull/8792
- Revert "Skip two sub-tests on llvm 21.1" by @abadams in https://github.com/halide/Halide/pull/8806
- Fix WASM splat op check test. by @mcourteaux in https://github.com/halide/Halide/pull/8705
Build
- Fix workflow for next release by @alexreinking in https://github.com/halide/Halide/pull/8514
- Fix Debian packaging by @alexreinking in https://github.com/halide/Halide/pull/8524
- Remove llvm version check from Makefile by @abadams in https://github.com/halide/Halide/pull/8533
- Drop deprecated / unsupported setups for Halide 20 by @alexreinking in https://github.com/halide/Halide/pull/8508
- Fix check for Windows never having aligned_alloc available. by @mcourteaux in https://github.com/halide/Halide/pull/8551
- Don't include CMAKE_INSTALL_PREFIX when LIBDIR is absolute by @alexreinking in https://github.com/halide/Halide/pull/8552
- Add target-nvptx to target-all in vcpkg.json by @alexreinking in https://github.com/halide/Halide/pull/8562
- Fix top of LLVM, and remove upper limit of LLVM version from CMakeLists. by @mcourteaux in https://github.com/halide/Halide/pull/8568
- build_halide_h asserts that every header it slurps in is one of the args by @abadams in https://github.com/halide/Halide/pull/8559
- Upgrade pybind11 to 2.11.1 by @alexreinking in https://github.com/halide/Halide/pull/8616
- Drop check for LLVM_LIBCXX in FindHalide_LLVM.cmake by @alexreinking in https://github.com/halide/Halide/pull/8617
- Fix finding LLD on Homebrew when multiple versions are installed. by @alexreinking in https://github.com/halide/Halide/pull/8619
- Fix build on GCC 15 (Comes with Fedora 42). by @mcourteaux in https://github.com/halide/Halide/pull/8626
- Constrain Clang and LLD searches to LLVM version by @alexreinking in https://github.com/halide/Halide/pull/8634
- Disallow empty CMAKE_BUILD_TYPE on single-config generators by @alexreinking in https://github.com/halide/Halide/pull/8651
- Add missing outputs to add_halide_library; fix advice in Lesson 21. by @alexreinking in https://github.com/halide/Halide/pull/8660
- Bump the LLVM version in the pip package to 20.1.8 by @alexreinking in https://github.com/halide/Halide/pull/8698
- Prefer to build against libjpeg-turbo and document this. by @alexreinking in https://github.com/halide/Halide/pull/8775
- Add C++17 requirement to RunGenMain CMake target by @alexreinking in https://github.com/halide/Halide/pull/8795
- Allow llvm-ar in BundleStatic.cmake by @alexreinking in https://github.com/halide/Halide/pull/8799
- Fix dubious find_package logic in test/generator by @alexreinking in https://github.com/halide/Halide/pull/8804
- Warning when extra-output is requested w/o filename by @FabianSchuetze in https://github.com/halide/Halide/pull/8671
- Makefile linker flag fixes and cleanups by @abadams in https://github.com/halide/Halide/pull/8764
Ongoing maintenance
- Fix clang-tidy-19 errors by @steven-johnson in https://github.com/halide/Halide/pull/8509
- Remove unused function in HexagonOptimize by @steven-johnson in https://github.com/halide/Halide/pull/8511
- Fix two non-idiomatic uses of node_type by @abadams in https://github.com/halide/Halide/pull/8520
- Handle some misc TODOs by @abadams in https://github.com/halide/Halide/pull/8528
- Use a consistent idiom for visit_let by @abadams in https://github.com/halide/Halide/pull/8540
- Upgrade to clang-format 19 by @alexreinking in https://github.com/halide/Halide/pull/8543
- Suppress clang-tidy warning for make_with_shape_of() by @steven-johnson in https://github.com/halide/Halide/pull/8545
- Remove debugging print left in by @abadams in https://github.com/halide/Halide/pull/8572
- Fixes for llvm trunk by @abadams in https://github.com/halide/Halide/pull/8590
- Another fix for llvm trunk by @abadams in https://github.com/halide/Halide/pull/8591
- Our internal error macros were redesigned:
- Accurately annotate Error system with [[noreturn]] by @alexreinking in https://github.com/halide/Halide/pull/8564
- Teach compilers that internal_error does not return. by @alexreinking in https://github.com/halide/Halide/pull/8807
- Use a new macro trick to avoid throwing in destructors. by @alexreinking in https://github.com/halide/Halide/pull/8774
- Move to opaque llvm pointers by @abadams in https://github.com/halide/Halide/pull/8614
- Avoid throwing from a destructor in PartitionLoops.cpp by @alexreinking in https://github.com/halide/Halide/pull/8767
- Bump version to 21.0.0 by @alexreinking in https://github.com/halide/Halide/pull/8810
- remove superfluous overload that causes compile errors by @ongjunjie in https://github.com/halide/Halide/pull/8654
- Remove obsolete WasmExecutor specific debug macro. by @zvookin in https://github.com/halide/Halide/pull/8670
- Add missing header by @vksnk in https://github.com/halide/Halide/pull/8680
- Attempted fix for LLVM change by @abadams in https://github.com/halide/Halide/pull/8642
- Fix top LLVM: renamed NVPTX barrier intrinsics. by @mcourteaux in https://github.com/halide/Halide/pull/8631
New Contributors
- @changhoon-sung made their first contribution in https://github.com/halide/Halide/pull/8612
- @vawale made their first contribution in https://github.com/halide/Halide/pull/8707
- @rtzam made their first contribution in https://github.com/halide/Halide/pull/8773
Full Changelog: https://github.com/halide/Halide/compare/v19.0.0...v21.0.0