Please do not feed the models
Fast ML inference & training for ONNX models in Rust
Open deep learning compiler stack for cpu, gpu, etc.
lightweight, standalone C++ inference engine for Google's Gemma models
High-Resolution Image Synthesis with Latent Diffusion Models
AI video generator optimized for low VRAM and older GPUs use
Official inference framework for 1-bit LLMs
Simplifies the local serving of AI models from any source
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
ArrayFire, a general purpose GPU library
Python-free Rust inference server
QVAC Fabric: cross-platform LLM inference and fine-tuning
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Training neural networks on Apple Neural Engine via APIs
FlashMLA: Efficient Multi-head Latent Attention Kernels
A Python package for extending the official PyTorch
Text and image to video generation: CogVideoX and CogVideo
950 line, minimal, extensible LLM inference engine built from scratch
Bailing is a voice dialogue robot similar to GPT-4o
Lemonade helps users run local LLMs with the highest performance
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Supercharge Your Model Training
TensorRT LLM provides users with an easy-to-use Python API
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
OpenVINO™ Toolkit repository