mac code is a local AI coding agent designed to run large language models directly on Apple Silicon machines without relying on cloud services, effectively transforming a Mac into a self-contained AI development environment. The project focuses on enabling models that traditionally exceed available RAM to run efficiently by streaming model weights from SSD storage, thereby overcoming hardware limitations through innovative memory management techniques. It operates as a CLI-based assistant that routes user prompts into different execution paths such as chat, shell commands, or web search, functioning as a multi-purpose development agent. The system integrates with inference engines like llama.cpp and Apple’s MLX framework, allowing users to run models up to 35B parameters locally with varying performance trade-offs.
Features
- Local execution of large language models without cloud dependency
- SSD-based weight streaming to run models beyond RAM limits
- LLM-as-router system for chat, shell, and search tasks
- Integration with llama.cpp and MLX backends
- Persistent KV cache for long-context and session continuity
- Support for large MoE models with optimized performance techniques