C++ library for high performance inference on NVIDIA GPUs
A high-performance ML model serving framework, offers dynamic batching
Port of OpenAI's Whisper model in C/C++
High-performance neural network inference framework for mobile
Sparsity-aware deep learning inference runtime for CPUs
lightweight, standalone C++ inference engine for Google's Gemma models
Standardized Serverless ML Inference Platform on Kubernetes
Low-latency REST API for serving text-embeddings
Easy-to-use deep learning framework with 3 key features
High quality, fast, modular reference implementation of SSD in PyTorch
A computer vision framework to create and deploy apps in minutes
Fast and user-friendly runtime for transformer inference