| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| CUTLASS 4.4.2 source code.tar.gz | 2026-03-17 | 39.3 MB | |
| CUTLASS 4.4.2 source code.zip | 2026-03-17 | 48.3 MB | |
| README.md | 2026-03-17 | 687 Bytes | |
| Totals: 3 Items | 87.6 MB | 0 | |
CuTe DSL
- New features
- CuTe DSL now supports Python 3.14 for both x86_64 and aarch64
- Runtime Pointer/Tensor/FakeTensor now supports cache_key, providing a stable, hashable representation that simplifies and improves compiled function caching.
- Bug fixing and improvements
- Fixed Hopper FMHA causal attention performance regression on CUDA toolkit 13.1 by optimizing mbarrier synchronization to avoid unnecessary convergence barriers.
- Fix kernel loading race condition when multiple GPU are present in the same process in JAX.
CUTLASS C++
- Enable Blackwell SM120f compilation of examples and exposes NVFP4/MX Grouped GEMM in the CUTLASS Profiler.