A library for accelerating Transformer models on NVIDIA GPUs
C++ library for high performance inference on NVIDIA GPUs
ONNX Runtime: cross-platform, high performance ML inferencing
A set of Docker images for training and serving models in TensorFlow
A GPU-accelerated library containing highly optimized building blocks
Open platform for training, serving, and evaluating language models
Toolbox of models, callbacks, and datasets for AI/ML researchers
Guide to deploying deep-learning inference networks
Toolkit for allowing inference and serving with MXNet in SageMaker
CPU/GPU inference server for Hugging Face transformer models