easyVoice is an open-source text-to-speech platform aimed at turning long-form text and novels into high-quality audio, with a strong focus on usability and scalability. It provides a web interface where users can paste or upload large texts and generate speech and subtitles in a single workflow, even for works exceeding 100,000 characters. The system supports multi-role voice acting, letting users assign different neural voices to different characters or narrative roles and configure parameters such as rate, pitch, and volume per role. It offers streaming playback so audio starts almost immediately, even for very long inputs, and automatically generates subtitle files suitable for video production or translation workflows. Under the hood, easyVoice uses a modern stack with Vue 3 and Element Plus on the front end, Node.js and Express on the back end, and TTS engines such as Microsoft Azure TTS and OpenAI-compatible APIs, orchestrated through ffmpeg.
Features
- Web-based interface for converting very long texts and novels into speech
- Multi-role voice configuration with per-character voice, rate, pitch, and volume
- Automatic subtitle file generation alongside audio for video or translation use
- Streaming playback engine enabling instant listening while long texts render
- Flexible deployment via Node.js or Docker, with detailed environment configuration
- AI-powered voice recommendations and REST API endpoints for scripted multi-character audio