Sparsity-aware deep learning inference runtime for CPUs
Operating LLMs in production
Powering Amazon custom machine learning chips
An MLOps framework to package, deploy, monitor and manage models
OpenMMLab Model Deployment Framework
A computer vision framework to create and deploy apps in minutes
Serve machine learning models within a Docker container
CPU/GPU inference server for Hugging Face transformer models