uzu is a high-performance inference engine designed to run artificial intelligence models efficiently on Apple Silicon hardware. Written primarily in Rust and leveraging Apple’s Metal framework, the project focuses on maximizing performance when executing large language models and other AI workloads on devices such as Mac computers with M-series chips. The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. By utilizing Apple’s unified memory architecture, uzu reduces memory copying overhead and improves inference throughput for local AI workloads. The system includes a simple high-level API that enables developers to run models, create inference sessions, and generate outputs with minimal configuration.
Features
- High-performance inference engine optimized for Apple Silicon hardware
- Hybrid execution architecture combining GPU kernels and MPSGraph computation
- Unified memory utilization for efficient model execution on Apple devices
- High-level API for creating inference sessions and running AI models
- Command-line interface for running models, serving APIs, and benchmarking performance
- Language bindings for Swift and Node.js enabling integration into applications