BlockSparse
Efficient GPU kernels for block-sparse matrix multiplication
...The idea is to exploit block-level sparsity — i.e. treat matrices or weight tensors as composed of blocks, many of which may be zero or unused — to save compute and memory when sparsity patterns are structured. This is particularly useful in models like Sparse Transformers, where attention matrices or intermediate layers may adopt block-sparse patterns to scale better. The repo implements both blocksparse and blockwise convolution/transpose-convolution primitives, with support for preparing, executing, and verifying those ops on NVIDIA GPUs. ...