C++ library for high performance inference on NVIDIA GPUs
FlashMLA: Efficient Multi-head Latent Attention Kernels
A NumPy-compatible array library accelerated by CUDA
Thin, unified, C++-flavored wrappers for the CUDA APIs
Lightning fast C++/CUDA neural network framework
GPU DataFrame Library
AWS Libfabric
oneAPI Deep Neural Network Library (oneDNN)
Transformers4Rec is a flexible and efficient library
A library for deep learning end-to-end dialog systems and chatbots
Build and run Docker containers leveraging NVIDIA GPUs
The C++ parallel algorithms library
GUI for training of neural network models for GuitarML Proteus
Facebook AI Research Sequence-to-Sequence Toolkit written in Python
YOLO ROS: Real-Time Object Detection for ROS
Polyhedral compiler for expressing fast and portable data algorithms
A fast open framework for deep learning
OpenCV Pre-built CUDA binaries
CUDA-enabled machine learning library for recurrent neural networks