CogView4, CogView3-Plus and CogView3(ECCV 2024)
Visual Causal Flow
Qwen3-ASR is an open-source series of ASR models
Chinese and English multimodal conversational language model
HY-Motion model for 3D character animation generation
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
The official repo of Qwen chat & pretrained large language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Implementation of "MobileCLIP" CVPR 2024
Generate Any 3D Scene in Seconds
Memory-efficient and performant finetuning of Mistral's models
Unified Multimodal Understanding and Generation Models
Audio foundation model excelling in audio understanding
Multimodal Diffusion with Representation Alignment
Repo of Qwen2-Audio chat & pretrained large audio language model
Multi-modal large language model designed for audio understanding
Large Multimodal Models for Video Understanding and Editing
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Block Diffusion for Ultra-Fast Speculative Decoding
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Open-weight, large-scale hybrid-attention reasoning model
Open-source framework for intelligent speech interaction
FAIR Sequence Modeling Toolkit 2
Official implementation of DreamCraft3D
Phi-3.5 for Mac: Locally-run Vision and Language Models