Repo of Qwen2-Audio chat & pretrained large audio language model
Open-source framework for intelligent speech interaction
Chat & pretrained large audio language model proposed by Alibaba Cloud
Audio foundation model excelling in audio understanding
Multi-modal large language model designed for audio understanding
LLM-based Reinforcement Learning audio edit model
Official Python inference and LoRA trainer package
A Family of Open Sourced Music Foundation Models
Qwen3-omni is a natively end-to-end, omni-modal LLM
Capable of understanding text, audio, vision, video
Open-source multi-speaker long-form text-to-speech model
ChatGPT interface with better UI
Open Source Speech Language Model
Industrial-level controllable zero-shot text-to-speech system
A Systematic Framework for Interactive World Modeling
Qwen3-ASR is an open-source series of ASR models
Python inference and LoRA trainer package for the LTX-2 audio–video
VMZ: Model Zoo for Video Modeling
Controllable & emotion-expressive zero-shot TTS
Qwen3-TTS is an open-source series of TTS models
State-of-the-art TTS model under 25MB
Official repository for LTX-Video
High-resolution models for human tasks
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Large Multimodal Models for Video Understanding and Editing