Download Latest Version CUTLASS 4.4.2 source code.tar.gz (39.3 MB)
Email in envelope

Get an email when there's a new version of CUTLASS

Home / v4.4.2
Name Modified Size InfoDownloads / Week
Parent folder
CUTLASS 4.4.2 source code.tar.gz 2026-03-17 39.3 MB
CUTLASS 4.4.2 source code.zip 2026-03-17 48.3 MB
README.md 2026-03-17 687 Bytes
Totals: 3 Items   87.6 MB 0

CuTe DSL

  • New features
  • CuTe DSL now supports Python 3.14 for both x86_64 and aarch64
  • Runtime Pointer/Tensor/FakeTensor now supports cache_key, providing a stable, hashable representation that simplifies and improves compiled function caching.
  • Bug fixing and improvements
  • Fixed Hopper FMHA causal attention performance regression on CUDA toolkit 13.1 by optimizing mbarrier synchronization to avoid unnecessary convergence barriers.
  • Fix kernel loading race condition when multiple GPU are present in the same process in JAX.

CUTLASS C++

  • Enable Blackwell SM120f compilation of examples and exposes NVFP4/MX Grouped GEMM in the CUTLASS Profiler.
Source: README.md, updated 2026-03-17