One platform to build, fine-tune, and deploy ML models. No MLOps team required.
Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Try Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
1 min voice data can also be used to train a good TTS model
GPT‑SoVITS is a state-of-the-art voice conversion and TTS system that enables zero‑shot and few‑shot synthesis based on a short vocal sample (e.g., 5 seconds). It supports cross‑lingual speech synthesis across English, Chinese, Japanese, Korean, Cantonese, and more. It's powered by VITS architecture enhanced for few‑sample adaptation and real‑time usability.
[WIP] VoiceSmith makes training text to speech models easy
...Both models were pretrained on a proprietary 5000 speaker dataset. It also provides some tools for dataset preprocessing like automatic text normalization. Windows (only CPU supported currently) or any Linux based operating system. If you want to run this on macOS you have to follow the steps in build from source in order to create the installer. This is untested since I don't currently own a Mac. NVIDIA GPU with CUDA support is highly recommended, you can train on CPU otherwise but it will take days if not weeks. VoiceSmith currently uses a two-stage modified DelightfulTTS and UnivNet pipeline.