Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
1 min voice data can also be used to train a good TTS model
A sound cloning tool with a web interface, using your voice
Private AI platform for agents, enterprise search and RAG pipelines
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Multi-lingual large voice generation model, providing inference
Generate audiobooks from e-books
A subtitle generator for Japanese Adult Videos.
Shinkai allows you to create advanced AI (local) agents effortlessly
VITS2 backbone with multilingual-bert
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
Unofficial Parallel WaveGAN
Written or imported text offline read or online download.
PyTorch implementation of convolutional neural networks