Run Local LLMs on Any Device. Open-source
TTS with kokoro and onnx runtime
AirLLM 70B inference with single 4GB GPU
950 line, minimal, extensible LLM inference engine built from scratch
Clippy, now with some AI
A high-performance inference engine for AI models
Course to get into Large Language Models (LLMs)
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
A simple, performant and scalable Jax LLM
Explore large language models in 512MB of RAM