Robust Speech Recognition via Large-Scale Weak Supervision
Industrial-level controllable zero-shot text-to-speech system
Accurate × Fast × Comprehensive
End-to-end speech processing toolkit
Open-source industrial-grade ASR models
TorchMultimodal is a PyTorch library
LLM training code for MosaicML foundation models
A Conversational Speech Generation Model
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis
Basaran, an open-source alternative to the OpenAI text completion API
Implementation of NÜWA, attention network for text to video synthesis
Text-conditional image generation model based on OpenAI's unCLIP
CPT: A Pre-Trained Unbalanced Transformer
Singing Voice Synthesis via Shallow Diffusion Mechanism
ALIbaba's Collection of Encoder-decoders from MinD
Toolkit for Machine Learning, Natural Language Processing
Toolkit for efficient experimentation with Speech Recognition