A RWKV management and startup tool, full automation, only 8MB
Run serverless GPU workloads with fast cold starts on bare-metal
Port of Facebook's LLaMA model in C/C++
On-device Speech Recognition for Apple Silicon
lightweight, standalone C++ inference engine for Google's Gemma models
A general-purpose probabilistic programming system
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Fast inference engine for Transformer models
Unified Model Serving Framework
MNN is a blazing fast, lightweight deep learning framework
High quality, fast, modular reference implementation of SSD in PyTorch
The deep learning toolkit for speech-to-text
Fast and user-friendly runtime for transformer inference