Industrial-level controllable zero-shot text-to-speech system
Qwen3-ASR is an open-source series of ASR models
Recovering the Visual Space from Any Views
CLIP, Predict the most relevant text snippet given an image
Controllable & emotion-expressive zero-shot TTS
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Let us control diffusion models
llama.go is like llama.cpp in pure Golang
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Dual LSTM Encoder for Dialog Response Generation