Operating LLMs in production
Powering Amazon custom machine learning chips
Sparsity-aware deep learning inference runtime for CPUs
An MLOps framework to package, deploy, monitor and manage models
OpenMMLab Model Deployment Framework
A computer vision framework to create and deploy apps in minutes
Serve machine learning models within a Docker container
CPU/GPU inference server for Hugging Face transformer models