LiteRT is Google's next-generation on-device machine learning framework and the successor to TensorFlow Lite, designed for high-performance AI and generative AI deployment across edge devices. It provides efficient model conversion, optimization, and runtime execution while leveraging hardware acceleration from CPUs, GPUs, and NPUs. LiteRT supports a wide range of platforms, including Android, iOS, Linux, macOS, Windows, web environments, and IoT devices. The framework simplifies on-device AI development through automated accelerator selection, asynchronous execution, and optimized memory handling. It also includes specialized support for large language models and generative AI workloads through LiteRT-LM and related tooling. With broad hardware compatibility and advanced performance optimizations, LiteRT enables developers to build fast, scalable, and efficient AI applications that run directly on user devices.
Features
- Cross-Platform AI Deployment – Supports Android, iOS, Linux, macOS, Windows, web, and IoT environments from a unified framework.
- Advanced Hardware Acceleration – Optimizes inference using CPUs, GPUs, and NPUs from leading chipset providers, including Google Tensor, Qualcomm, MediaTek, and Intel.
- Compiled Model API – Automates accelerator selection, enables asynchronous execution, and improves I/O buffer management for streamlined development.
- Generative AI Optimization – Provides dedicated tools and runtimes for deploying large language models, diffusion models, and other GenAI workloads on-device.
- Model Conversion & Quantization – Converts and optimizes PyTorch and other machine learning models for efficient edge deployment and reduced resource usage.
- High-Performance Runtime Engine – Delivers low-latency inference with zero-copy buffer interoperability, advanced GPU acceleration, and efficient model execution.