Unified Multimodal Understanding and Generation Models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Chinese and English multimodal conversational language model
Qwen3-ASR is an open-source series of ASR models
Implementation of "MobileCLIP" CVPR 2024
Qwen3 is the large language model series developed by Qwen team
OCR expert VLM powered by Hunyuan's native multimodal architecture
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference
The official repo of Qwen chat & pretrained large language model
Ultra-Efficient LLMs on End Device
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Memory-efficient and performant finetuning of Mistral's models
Fast-stable-diffusion + DreamBooth
Block Diffusion for Ultra-Fast Speculative Decoding
Audio foundation model excelling in audio understanding
Phi-3.5 for Mac: Locally-run Vision and Language Models
LLM-based Reinforcement Learning audio edit model
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Large Multimodal Models for Video Understanding and Editing
Multimodal Diffusion with Representation Alignment
Repo of Qwen2-Audio chat & pretrained large audio language model
The official PyTorch implementation of Google's Gemma models
Multi-modal large language model designed for audio understanding
Renderer for the harmony response format to be used with gpt-oss
Open-weight, large-scale hybrid-attention reasoning model