Running large language models on a single GPU
Real-time NVIDIA GPU dashboard
Python-free Rust inference server
A high-quality rapid TTS voice cloning model
AirLLM 70B inference with single 4GB GPU
Building an Intelligent Agent from Scratch
A TTS that fits in your CPU (and pocket)
MemU is an open-source memory framework for AI companions
Official inference framework for 1-bit LLMs
Unified KV Cache Compression Methods for Auto-Regressive Models
Neural Network architecture based on ideas of the original LSTM
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Demo of a customer service use case implemented with the OpenAI Agents
Open-source large language model family from Tencent Hunyuan
Redundancy-aware KV Cache Compression for Reasoning Models
AI Agent Source Code Deep Research Report
Memory-efficient and performant finetuning of Mistral's models
157 models, 30 providers, one command to find what runs on hardware
Persistent context and multi-instance coordination
The repository provides code for running inference with SAM 2
Self-evolving autonomous agent framework
A step-by-step guide to build your own AI agent
A lightweight text-to-speech model with zero-shot voice cloning
BitNet: Scaling 1-bit Transformers for Large Language Models
Official plugin for OpenClaw that exports agent traces to Opik