Robust Speech Recognition via Large-Scale Weak Supervision
Industrial-level controllable zero-shot text-to-speech system
Accurate × Fast × Comprehensive
End-to-end speech processing toolkit
Multimodal model achieving SOTA performance
A Conversational Speech Generation Model
Singing Voice Synthesis via Shallow Diffusion Mechanism
Toolkit for efficient experimentation with Speech Recognition
Flexible text-to-text transformer model for multilingual NLP tasks
Summarization model fine-tuned on CNN/DailyMail articles