Capable of understanding text, audio, vision, video
Qwen3-omni is a natively end-to-end, omni-modal LLM
Code and models for ICML 2024 paper, NExT-GPT
Search all of YouTube from the command line
Qwen2.5-VL is the multimodal large language model series
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Data Infrastructure providing an approach to multimodal AI workloads
Build multimodal language agents for fast prototype and production
A Pioneering Open-Source Alternative to GPT-4o
Benchmark LLMs by fighting in Street Fighter 3