HunyuanOCR

HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. HunyuanOCR handles complex documents: multi-column layouts, tables, mathematical formulas, mixed languages, handwritten or stylized fonts, receipts, tickets, and even video-frame subtitles. The project provides code, pretrained weights, and inference instructions, making it feasible to deploy locally or on a server, and to integrate with applications.

Features

End-to-end OCR Vision-Language Model: detection, recognition, layout parsing, translation, and structured output generation in a single inference pass
Lightweight (~1 billion parameters) yet achieves state-of-the-art performance across benchmarks for complex documents, multilingual text, handwritten/stylized fonts, receipts, tickets, and more
Supports complex layouts including columns, tables, formulas, multi-language text, mixed fonts/styles, and video subtitles/frames
Produces structured outputs (e.g., JSON, HTML, Markdown, LaTeX, translated text), enabling downstream processing like automated form filling or data extraction
Open-source with code, pretrained weights and inference scripts — easy to integrate locally or in production workflows
Efficient inference pipeline (via a native-resolution encoder + adaptive visual adapter + light LLM), lowering computational cost compared to massive models

Project Samples

Project Activity

See All Activity >

Follow HunyuanOCR

HunyuanOCR Web Site

Other Useful Business Software

Gen AI apps are built with MongoDB Atlas

Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.

Start Free

Rate This Project

User Reviews

Be the first to post a review of HunyuanOCR!

Additional Project Details

Operating Systems

Linux

Programming Language

Python

Related Categories

Python OCR Software, Python AI Models

Registered

1 day ago

Similar Business Software

LM-Kit.NET

LM-Kit.NET is a cutting-edge, high-level inference SDK designed specifically to bring the advanced capabilities of Large Language Models (LLM) into the C# ecosystem. Tailored for developers working within .NET, LM-Kit.NET provides a comprehensive suite of powerful Generative AI tools, making...

See Software
ARGOS Identity

ARGOS is an AI-powered Identity Platform. We revolutionize how the world experiences identity. We create essential identity services for people and businesses to ensure a secure digital ecosystem worldwide. We provide services to help you identify Anyone Anywhere Anytime! ARGOS’s ID check...

See Software
PackageX OCR Scanning

PackageX OCR API converts any smartphone into a powerful universal label scanner that reads every bit of text on the label, including barcodes and QR codes. Our state-of-the-art OCR technology uses robust deep learning models and proprietary algorithms to extract information from package...

See Software
Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
Nutrient SDK

Nutrient is the comprehensive solution for all your PDF needs, offering tools that effortlessly integrate and operate PDF functionality across any platform. 1. SDK PRODUCTS Integrate robust PDF functionality into iOS, Android, Windows, web (JavaScript), or any cross-platform technology,...

See Software
Square 9

Paper-based work is a soul-crushing, profit-sapping drag on individual, team, and company productivity. Paper literally smothers innovation, creating a competitive disadvantage. The Square 9 AI-powered intelligent information processing platform takes the paper out of work and makes it easier...

See Software

Report inappropriate content

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

Get an email when there's a new version of HunyuanOCR

Features

Project Samples

Project Activity

Categories

Follow HunyuanOCR

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered