Voicebox is a local-first voice synthesis studio that aims to bring professional, DAW-like voice generation workflows to a desktop app while keeping models and voice data entirely on your machine. It positions itself as an open-source alternative to cloud voice platforms by emphasizing privacy, offline use, and freedom from subscriptions or usage caps. The tool supports downloading voice models, cloning voices from short audio samples, and generating speech locally, then organizing the results using studio-oriented editing concepts. A standout capability is its multi-track timeline editor and supporting audio tools (like trimming and conversation mixing), which let creators compose multi-voice scenes instead of generating single clips in isolation. It is API-first, meaning you can use it as an app for production work or integrate its speech generation into your own software via an API layer.
Features
- Local-first voice cloning and speech generation for privacy and control
- Multi-track timeline editor for DAW-like voice composition
- Built-in audio trimming and conversation mixing workflows
- Model download and management with flexible model backends
- API-first design for integrating voice synthesis into apps
- Native-performance desktop app built with Tauri, not Electron