Contexts Optical Compression
Accurate × Fast × Comprehensive
Visual Causal Flow
Awesome multilingual OCR toolkits based on PaddlePaddle
OCR expert VLM powered by Hunyuan's native multimodal architecture
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Qwen3-omni is a natively end-to-end, omni-modal LLM
Reasoning-powered OCR VLM for converting complex documents to Markdown
Lightweight multimodal translation model for 55 languages