The Triton Inference Server provides an optimized cloud
A scalable inference server for models optimized with OpenVINO
Easiest and laziest way for building multi-agent LLMs applications
Standardized Serverless ML Inference Platform on Kubernetes
Deep Learning API and Server in C++14 support for Caffe, PyTorch
Open Source and Lightweight Local LLM Platform
Deploy a ML inference service on a budget in 10 lines of code