Long-form streaming TTS system for multi-speaker dialogue generation
Open Source Speech Language Model
Open-source framework for intelligent speech interaction
Controllable & emotion-expressive zero-shot TTS
Industrial-level controllable zero-shot text-to-speech system
Qwen3-TTS is an open-source series of TTS models
Capable of understanding text, audio, vision, video
GLM-4-Voice | End-to-End Chinese-English Conversational Model
State-of-the-art TTS model under 25MB
Foundational Models for State-of-the-Art Speech and Text Translation
Repo of Qwen2-Audio chat & pretrained large audio language model
Multi-modal large language model designed for audio understanding
From Images to High-Fidelity 3D Assets
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Wan2.1: Open and Advanced Large-Scale Video Generative Model
LLM-based Reinforcement Learning audio edit model
High-Resolution Image Synthesis with Latent Diffusion Models
A Conversational Speech Generation Model
High-Fidelity and Controllable Generation of Textured 3D Assets
Image generation model with single-stream diffusion transformer
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
A fast, local neural text to speech system
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Official PyTorch Implementation of "Scalable Diffusion Models"
GLIDE: a diffusion-based text-conditional image synthesis model