Download Latest Version v0.9.0 source code.tar.gz (1.4 MB)
Email in envelope

Get an email when there's a new version of CubeCL

Home / v0.6.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-07-18 8.9 kB
v0.6.0 source code.tar.gz 2025-07-18 1.4 MB
v0.6.0 source code.zip 2025-07-18 1.8 MB
Totals: 3 Items   3.2 MB 0

Summary

CubeCL 0.6.0 introduces significant enhancements to performance, functionality, and compatibility across various backends. Key features include n-dimensional convolution, multi-stage matrix multiplication (matmul), and dynamic shared memory support for CUDA. Performance optimizations, such as reworked into_contiguous and double buffering, improve efficiency. New functionality like random number generation, fp8/fp6 support, and recursive profiling enhance the library's capabilities. Bug fixes address issues in backends (Metal, HIP, Vulkan, WASM), memory alignment, and deadlocks.

What's New

Features

  • N-Dimensional Convolution: Added support for n-dimensional convolution operations (@wingertge, #649).
  • Multi-Stage Convolution: Implemented multi-stage convolution for enhanced processing (@wingertge, #602).
  • Matrix Multiplication Enhancements:
  • Added double-stage matmul with k > 1 (@louisfd, #653).
  • Generalized tilewise loading for multiple tiles (@louisfd, #655).
  • Introduced ordered double buffering (@louisfd, #680).
  • Added specialized configs, event listener refactoring, and selection improvements (@louisfd, @nathanielsimard, #710, [#711], [#719], [#722], [#749], [#751]).
  • Unit matmul with plane matmul merging and double buffering (@louisfd, #686, [#697]).
  • Random Number Generation: Added random number generation with vectorized kernels and improved tests (@Cielbird, #673, [#677], [#679], [#681], [#682]).
  • Low-Precision Support: Added fp8, fp6, and theoretical fp4 support (@wingertge, #675).
  • Dynamic Shared Memory on CUDA: Enabled dynamic shared memory allocation for CUDA (@wingertge, #620).
  • Intrinsic Macro: Introduced intrinsic macro support for enhanced flexibility (@wingertge, #639).
  • Recursive Profiling: Added recursive profiling capabilities (@nathanielsimard, #674).
  • Sync Plane Instruction: Added sync_plane instruction for synchronization (@louisfd, #676).
  • CubeCL Configuration: Introduced configuration options for CubeCL (@nathanielsimard, #665).
  • Multi-Tensor Allocation: Added support for multi-tensor allocation to handle quantization (@wingertge, #661).
  • Autotune Enhancements:
  • Made autotune optional (@nathanielsimard, #685).
  • Added basic error handling for autotune (@nathanielsimard, #738).
  • Improved matmul selection and tuner deadlock fixes (@nathanielsimard, #771, [#782]).
  • f16 Support for WGSL: Added f16 support to the WGSL backend (@wingertge, #658).
  • GFX10 (RDNA2) Support: Added support for GFX10 architecture (@VirxEC, #662).
  • Graphviz Output for SPIR-V: Added Graphviz output to spirv-dump for better visualization (@wingertge, #664).
  • PTX WMMA for CUDA: Added PTX WMMA support for CUDA (@syl20bnr, #668).
  • Tunable Priority: Introduced tunable priority for improved control (@nathanielsimard, #768).

Performance Improvements

  • Reworked into_contiguous for better performance (@wingertge, #621).
  • Optimized double buffering event cleanup (@nathanielsimard, #663).
  • Reduced mixed precision overhead (@nathanielsimard, #619).
  • Improved compilation times (@nathanielsimard, #669).
  • Sped up SPIR-V compilation and softened matmul autotune key (@nathanielsimard, #740).

Bug Fixes

  • Fixed cluster issues caused by merges (@wingertge, #648).
  • Corrected edge case in calculate_cube_count_elemwise (@wingertge, #646).
  • Fixed Metal and HIP slice offset issues (@louisfd, #651).
  • Resolved inner mutability and register mutability issues (@nathanielsimard, #652, [#656]).
  • Fixed deadlock by avoiding lock captures (@ArthurBrussee, #657).
  • Corrected buffer offset alignment and size calculation (@wingertge, #684).
  • Fixed WASM by using cfg(std_io) (@ArthurBrussee, #670).
  • Addressed Vulkan atomics issues (@nathanielsimard, #704).
  • Fixed configuration environment parsing (@nathanielsimard, #678).
  • Corrected random interval and logger profile issues (@laggui, @nathanielsimard, #744, [#683]).
  • Fixed Metal backend tests and removed unused warnings (@louisfd, #762, [#763]).
  • Addressed SPIR-V issues, including CMMA offset and compilation (@marcantoinem, @nathanielsimard, #752, [#764]).
  • Fixed matmul cube count overflow (@louisfd, #760).
  • Resolved tuner deadlock (@nathanielsimard, #782).
  • Fixed benchmark API for dead code elimination and memory alignment (@nathanielsimard, #712).

Refactorings

  • Unified slice implementation across backends (@nathanielsimard, #644).
  • Refactored init to IntoMut (@nathanielsimard, #659).
  • Split cubecl-linalg into cubecl-matmul and cubecl-convolution (@louisfd, #708).
  • Moved SPIR-V extension methods to rspirv-ext crate (@wingertge, #596).
  • Refactored matmul tiling scheme, setup, and compute resource dependency (@louisfd, #707, [#709], [#716]).
  • Moved profile logging to ComputeClient and made it async (@ArthurBrussee, #692).
  • Improved unit selector and HIP device refactoring (@nathanielsimard, #758, [#761]).
  • Cleaned up SPIR-V backend code (@marcantoinem, #769).

Documentation & Testing

  • Fixed typo in CubeCL book (@marcantoinem, #666).
  • Improved documentation with additional CubeCL book pages (@marcantoinem, #733, [#774]).
  • Enhanced matmul documentation and refactoring (@louisfd, #772, [#775]).
  • Improved debug information (@nathanielsimard, #689).
  • Added finer-grained feature flags for matmul tests (@louisfd, #734).
  • Updated matmul benchmarks (@nathanielsimard, #781).

Dependencies & Maintenance

  • Bumped version to 0.6.0 (@syl20bnr, #643).
  • Updated cudarc dependency (@wingertge, #637).
  • Updated cubecl-hip-sys to version 6.4.4348201 (@syl20bnr, #743).
  • Bumped major versions of dependencies (@ArthurBrussee, #776).
  • Silenced MAPPABLE_PRIMARY_BUFFERS warning (@ArthurBrussee, #688).

Thank you to all contributors for making CubeCL 0.6.0 possible!

Source: README.md, updated 2025-07-18