A RWKV management and startup tool, full automation, only 8MB
Run serverless GPU workloads with fast cold starts on bare-metal
Port of Facebook's LLaMA model in C/C++
On-device Speech Recognition for Apple Silicon
lightweight, standalone C++ inference engine for Google's Gemma models
A general-purpose probabilistic programming system
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Fast inference engine for Transformer models
Unified Model Serving Framework
High quality, fast, modular reference implementation of SSD in PyTorch
The deep learning toolkit for speech-to-text