Robust Speech Recognition via Large-Scale Weak Supervision
Provides code for running inference with the SegmentAnything Model
End-to-end speech processing toolkit
Industrial-level controllable zero-shot text-to-speech system
TorchMultimodal is a PyTorch library
A Conversational Speech Generation Model
Singing Voice Synthesis via Shallow Diffusion Mechanism
Code release for "Masked-attention Mask Transformer
PyTorch implementation of MAE
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Toolkit for efficient experimentation with Speech Recognition
A general-purpose encoder-decoder framework for Tensorflow
toneDetect