The Triton Inference Server provides an optimized cloud
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Deep Learning API and Server in C++14 support for Caffe, PyTorch
Openai style api for open large language models
Standardized Serverless ML Inference Platform on Kubernetes
Serve machine learning models within a Docker container