Qwen-Audio
Chat & pretrained large audio language model proposed by Alibaba Cloud
Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance without task-specific fine‐tuning. It includes features such as flexible multi-run chat, audio understanding/reasoning, music appreciation, and also tool usage (e.g. voice editing).