AI Runner is an offline inference engine designed to run a collection of AI workloads on your own machine, including image generation for art, real-time voice conversations, LLM-powered chatbots and automated workflows. It is implemented as a desktop-oriented Python application and emphasizes privacy and self-hosting, allowing users to work with text-to-speech, speech-to-text, text-to-image and multimodal models without sending data to external services. At the core of its LLM stack is a mode-based architecture with specialized “modes” such as Author, Code, Research, QA and General, and a workflow manager that automatically routes user requests to the right agent based on the task. The project has a strong focus on developer ergonomics, with thorough development guidelines, environment configuration using .env variables, and a clear structure for tests, tools and agents.
Features
- Self-hosted offline inference engine for LLMs, TTS, STT and image generation
- Mode-based LLM architecture with specialized agents for writing, coding, research and QA
- Desktop-focused Python app using frameworks like PySide and Pygame for UI and interaction
- Strong developer tooling with environment-variable configuration, black formatting and pytest-based tests
- Extensible tool and agent system for adding new capabilities and integrating external models
- Privacy-focused design that keeps prompts, conversations and generated media on local hardware