HunyuanOCR
OCR expert VLM powered by Hunyuan's native multimodal architecture
...It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. HunyuanOCR handles complex documents: multi-column layouts, tables, mathematical formulas, mixed languages, handwritten or stylized fonts, receipts, tickets, and even video-frame subtitles. ...