Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Multimodal-Driven Architecture for Customized Video Generation
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Chinese and English multimodal conversational language model
Learning to Act by Watching Unlabeled Online Videos