LiteRT-LM is Google’s open-source inference framework for deploying large language models on edge devices. It is built for production-oriented local LLM execution across Android, iOS, desktop, web, embedded, and IoT environments. The framework focuses on performance, hardware acceleration, and efficient model serving close to the user instead of relying only on remote cloud inference. It supports CPU execution across major platforms and adds GPU or NPU acceleration where available. LiteRT-LM is especially relevant for developers building private, low-latency AI features on phones, laptops, Raspberry Pi-style devices, and other edge hardware. Its goal is to make modern language models usable in local applications with a consistent deployment stack.
Features
- Edge LLM inference
- Android, iOS, desktop, web, and IoT support
- CPU, GPU, and NPU acceleration
- Prebuilt binaries and mobile demos
- Production-ready deployment focus
- Local low-latency AI execution