Multimodal Diffusion with Representation Alignment
Chat & pretrained large audio language model proposed by Alibaba Cloud
Qwen2.5-VL is the multimodal large language model series
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Code for the paper Hybrid Spectrogram and Waveform Source Separation