Port of Facebook's LLaMA model in C/C++
A RWKV management and startup tool, full automation, only 8MB
Run serverless GPU workloads with fast cold starts on bare-metal
lightweight, standalone C++ inference engine for Google's Gemma models
A general-purpose probabilistic programming system
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Fast inference engine for Transformer models
Unified Model Serving Framework
High quality, fast, modular reference implementation of SSD in PyTorch
The deep learning toolkit for speech-to-text
Fast and user-friendly runtime for transformer inference