ONNX Runtime: cross-platform, high performance ML inferencing
Port of OpenAI's Whisper model in C/C++
OpenVINO™ Toolkit repository
Port of Facebook's LLaMA model in C/C++
C++ library for high performance inference on NVIDIA GPUs
On-device AI across mobile, embedded and edge for PyTorch
Operating LLMs in production
Sparsity-aware deep learning inference runtime for CPUs
Powering Amazon custom machine learning chips
Fast inference engine for Transformer models
An MLOps framework to package, deploy, monitor and manage models
Set of comprehensive computer vision & machine intelligence libraries
OpenMMLab Model Deployment Framework
A computer vision framework to create and deploy apps in minutes
Serve machine learning models within a Docker container
High-level Deep Learning Framework written in Kotlin
CPU/GPU inference server for Hugging Face transformer models
Deep learning inference framework optimized for mobile platforms
Fast and user-friendly runtime for transformer inference