The Triton Inference Server provides an optimized cloud
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Openai style api for open large language models
Standardized Serverless ML Inference Platform on Kubernetes
Serve machine learning models within a Docker container