Accurate × Fast × Comprehensive
Qwen2.5-VL is the multimodal large language model series
OCR expert VLM powered by Hunyuan's native multimodal architecture
Contexts Optical Compression
Open source large language model by Alibaba
Reasoning-powered OCR VLM for converting complex documents to Markdown
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Multimodal 7B model for image, video, and text understanding tasks