lightweight, standalone C++ inference engine for Google's Gemma models
A GPU-accelerated library containing highly optimized building blocks
Fast inference engine for Transformer models
Lightweight inference library for ONNX files, written in C++
Deep learning inference framework optimized for mobile platforms