ONNX Runtime: cross-platform, high performance ML inferencing
Port of Facebook's LLaMA model in C/C++
Port of OpenAI's Whisper model in C/C++
OpenVINO™ Toolkit repository
Operating LLMs in production
Fast inference engine for Transformer models
On-device AI across mobile, embedded and edge for PyTorch
Sparsity-aware deep learning inference runtime for CPUs
Powering Amazon custom machine learning chips
An MLOps framework to package, deploy, monitor and manage models
Set of comprehensive computer vision & machine intelligence libraries
OpenMMLab Model Deployment Framework
A computer vision framework to create and deploy apps in minutes
Serve machine learning models within a Docker container
High-level Deep Learning Framework written in Kotlin
CPU/GPU inference server for Hugging Face transformer models