Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
CUTLASS 4.0.0 source code.tar.gz | 2025-06-27 | 32.9 MB | |
CUTLASS 4.0.0 source code.zip | 2025-06-27 | 41.5 MB | |
README.md | 2025-06-27 | 1.9 kB | |
Totals: 3 Items | 74.5 MB | 0 |
CuTe DSL
CuTe DSL is a Python DSL centered around CuTe's abstractions - Enables authoring kernels in Python to reach peak performance on NVIDIA GPUs - Core DSL implementation files - DSL quick start - DSL Overview - Educational notebooks for getting started with CuTe DSL
CUTLASS C++
- Support Family Specific Architecture Features which was introduced in CUDA 12.9
- Further improved Blockwise and Groupwise GEMMs on Hopper and Blackwell
- Enhance Blackwell SM100 Attention kernels in example 77
- Add Blackwell SM100 implicit GEMM conv fprop/dgrad/wgrad unit tests
- New Hopper SM90 FMHA example, similar in design to the existing Blackwell FMHA
- Cute enhancements: CuTe C++ reduce op
- Other functional and performance enhancements