C++ library for high performance inference on NVIDIA GPUs
Serve, optimize and scale PyTorch models in production
The Triton Inference Server provides an optimized cloud
Low-latency REST API for serving text-embeddings
A computer vision framework to create and deploy apps in minutes
Guide to deploying deep-learning inference networks
CPU/GPU inference server for Hugging Face transformer models