The blocksparse repository provides efficient GPU kernels (TensorFlow custom ops) for block-sparse matrix multiplication and convolution operations. The idea is to exploit block-level sparsity — i.e. treat matrices or weight tensors as composed of blocks, many of which may be zero or unused — to save compute and memory when sparsity patterns are structured. This is particularly useful in models like Sparse Transformers, where attention matrices or intermediate layers may adopt block-sparse patterns to scale better. The repo implements both blocksparse and blockwise convolution/transpose-convolution primitives, with support for preparing, executing, and verifying those ops on NVIDIA GPUs. In addition to low-level kernels, it includes wrapper code for integrating with TensorFlow, example scripts (e.g. a transformer on the enwik8 dataset), transformer logic that uses blocksparse operations, and debugging helpers.
Features
- Blocksparse matrix multiplication kernels tuned for GPU execution
- Blocksparse convolution and transpose-convolution (deconvolution) primitives
- TensorFlow custom ops / wrappers for easy integration into TF graphs
- Example transformer model using block-sparse ops (e.g. enwik8 script)
- Support utilities: shape helpers, edge-bias, normalization, bias operations
- Verification, testing, and debugging support to validate correctness of sparse execution