Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance without task-specific fine‐tuning. It includes features such as flexible multi-run chat, audio understanding/reasoning, music appreciation, and also tool usage (e.g. voice editing).
Features
- Supports various audio types: speech, natural sounds, music, singing etc.
- Multi-task training framework covering 30+ audio tasks to allow transfer across them and avoid interference
- Audio + text input and text output; Qwen-Audio-Chat enables dialogue over audio and text, multi-round interactions
- Excellent zero- or few-shot performance: achieves state-of-the-art on multiple audio benchmarks (Aishell1, cochlscene, ClothoAQA, VocalSound) without task‐specific fine-tuning
- Flexibility: supports multiple-audio analysis, sound understanding & reasoning, creative tasks like music appreciation, and external tool usage (e.g. voice editing)
- Multilingual support in many languages/dialects in audio; voice chat modes; designed for flexible real-world audio interaction scenarios
License
Apache License V2.0Follow Qwen-Audio
Other Useful Business Software
MongoDB Atlas runs apps anywhere
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of Qwen-Audio!