MOSS-TTS-Nano is a lightweight text-to-speech model designed for real-time voice generation in resource-constrained environments. It is part of the broader MOSS-TTS family and focuses on delivering high-quality speech synthesis with a compact architecture. The model operates efficiently on CPU-only systems, enabling deployment without specialized hardware. It supports multilingual voice cloning and produces high-fidelity audio with low latency. The system uses an autoregressive audio tokenization pipeline to generate natural-sounding speech. It is suitable for local applications, web services, and embedded systems. Overall, it brings advanced speech synthesis capabilities to lightweight and accessible environments.
Features
- Lightweight model with around 0.1B parameters
- Real-time speech generation on CPU
- Multilingual voice cloning capabilities
- Low-latency streaming audio output
- High-quality 48 kHz stereo audio support
- Designed for local and low-resource deployment