A toolkit to optimize ML models for deployment for Keras & TensorFlow
Serve, optimize and scale PyTorch models in production
C++ library for high performance inference on NVIDIA GPUs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
ONNX Runtime: cross-platform, high performance ML inferencing
Build your chatbot within minutes on your favorite device
Bolt is a deep learning library with high performance
Easy-to-use deep learning framework with 3 key features
Trainable models and NN optimization tools
Framework that is dedicated to making neural data processing
CPU/GPU inference server for Hugging Face transformer models