ONNX Runtime: cross-platform, high performance ML inferencing
A retargetable MLIR-based machine learning compiler runtime toolkit
Port of OpenAI's Whisper model in C/C++
LiteRT is the new name for TensorFlow Lite (TFLite)
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
MLX: An array framework for Apple silicon
Port of Facebook's LLaMA model in C/C++
OpenVINO™ Toolkit repository
Clean and efficient FP8 GEMM kernels with fine-grained scaling
On-device AI across mobile, embedded and edge for PyTorch
Emscripten: An LLVM-to-WebAssembly Compiler
C++ library for high performance inference on NVIDIA GPUs
Fast inference engine for Transformer models
OneFlow is a deep learning framework designed to be user-friendly
oneAPI Deep Neural Network Library (oneDNN)
Set of comprehensive computer vision & machine intelligence libraries
Runtime extension of Proximus enabling Deployment on AMD Ryzen™ AI
Plan execution language and executive
A High Performance Library for Sequence Processing and Generation
Deep learning inference framework optimized for mobile platforms
Real-time multi-person keypoint detection library for body, face, etc.
Fast and user-friendly runtime for transformer inference
Evolutionary Computation Framework in C++