Open image model at the forefront of design
Contexts Optical Compression
A Family of Open Sourced Music Foundation Models
OCR expert VLM powered by Hunyuan's native multimodal architecture
Renderer for the harmony response format to be used with gpt-oss
Accurate × Fast × Comprehensive
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Qwen2.5-VL is the multimodal large language model series
Visual Causal Flow
Block Diffusion for Ultra-Fast Speculative Decoding
Open source large language model by Alibaba
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Layout-aware OCR model for multilingual document understanding
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Lightweight multimodal translation model for 55 languages
NVFP4 DiffusionGemma model for fast multimodal text generation
Multimodal 7B model for image, video, and text understanding tasks
Google’s flagship dense multimodal model for coding and reasoning
Powerful 14B LLM with strong instruction and long-text handling
Multimodal Transformer for document image understanding and layout
Compact 3B-param multimodal model for efficient on-device reasoning
Efficient 8B multimodal model tuned for advanced reasoning tasks.
Efficient 14B multimodal instruct model with edge deployment and FP8
Multimodal agent model for coding, orchestration, and autonomy
Compact 8B multimodal instruct model optimized for edge deployment