C++ library for high performance inference on NVIDIA GPUs
FlashMLA: Efficient Multi-head Latent Attention Kernels
Thin, unified, C++-flavored wrappers for the CUDA APIs
scanf for modern C++
The quantities and units library for C++
oneAPI Deep Neural Network Library (oneDNN)
GPU DataFrame Library
Lightning fast C++/CUDA neural network framework
The regex-centric, fast lexical analyzer generator for C++
The C++ parallel algorithms library
Java wrapper for 7z archiver engine
YOLO ROS: Real-Time Object Detection for ROS
Polyhedral compiler for expressing fast and portable data algorithms
A fast open framework for deep learning
CUDA-enabled machine learning library for recurrent neural networks