Speech-AI-Forge is a full-stack project built around modern text-to-speech generation models, providing both an API server and a Gradio-based web UI for interactive use. At its core, it acts as a hub that wires together multiple speech-related capabilities, including TTS, speech-to-text and LLM-based control flows, behind a consistent interface. The system is designed to be deployed in several ways: you can try it online via hosted demos, spin it up in a one-click Colab environment, run it in Docker containers, or set it up locally with its environment preparation scripts. It is model-agnostic and advertises support for a variety of TTS and speech models such as ChatTTS, CosyVoice, Fish-Speech, FireredTTS and others, as well as Whisper-based ASR, giving you a flexible playground for experimenting with different speech stacks. The project also integrates with general-purpose LLMs (for example GPT- or LLaMA-style models), which can be used to pre-process text, manage conversations.
Features
- Unified API server for TTS, STT and speech-related workflows
- Gradio-based WebUI for interactive text-to-speech experimentation in the browser
- Support for multiple TTS backends such as ChatTTS, CosyVoice, Fish-Speech and more
- Integration of Whisper-style ASR and LLMs for end-to-end conversational pipelines
- Multiple deployment options including HuggingFace Spaces, Colab, Docker and local installs
- Organized examples and configs to help adapt the stack to custom production scenarios