Z80-μLM is a retro-computing AI project that demonstrates a tiny language model (Z80-μLM) engineered to run on an 8-bit Z80 CPU by aggressively quantizing weights down to 2-bit precision. The repository provides a complete workflow where you train or fine-tune conversational models in Python, then export them into a format that can be executed on classic Z80 systems. A key deliverable is producing CP/M-compatible .COM binaries, enabling a genuinely vintage “chat with your computer” experience on real hardware or accurate emulators. The project sits at the intersection of machine learning and systems constraints, showing how model architecture, quantization, and inference code generation can be adapted to extreme memory and compute limits. It also functions as an educational reference for how to reduce inference to operations that fit an old-school instruction set and runtime environment.
Features
- 2-bit quantized language model designed for Z80-class hardware
- Python training workflow with export tooling for deployment
- Generates CP/M .COM binaries for vintage execution environments
- Enables interactive “chat” sessions on retro machines or emulators
- Focuses on ultra-small inference footprints and constrained compute
- Serves as a reference implementation for extreme quantization pipelines
Categories
AI ModelsFollow Z80-μLM
User Reviews
-
maybe helpful for LLM edu