GLM-OCRZ.ai
|
Hugging Face TransformersHugging Face
|
|||||
Related Products
|
||||||
About
GLM-OCR is a multimodal optical character recognition model and open source repository that provides accurate, efficient, and comprehensive document understanding by combining text and visual modalities into a unified encoder–decoder architecture derived from the GLM-V family. Built with a visual encoder pre-trained on large-scale image–text data and a lightweight cross-modal connector feeding into a GLM-0.5B language decoder, the model supports layout detection, parallel region recognition, and structured output for text, tables, formulas, and complicated real-world document formats. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization, achieving state-of-the-art benchmarks on major document understanding tasks.
|
About
Transformers is a library of pretrained natural language processing, computer vision, audio, and multimodal models for inference and training. Use Transformers to train models on your data, build inference applications, and generate text with large language models. Explore the Hugging Face Hub today to find a model and use Transformers to help you get started right away. Simple and optimized inference class for many machine learning tasks like text generation, image segmentation, automatic speech recognition, document question answering, and more. A comprehensive trainer that supports features such as mixed precision, torch.compile, and FlashAttention for training and distributed training for PyTorch models. Fast text generation with large language models and vision language models. Every model is implemented from only three main classes (configuration, model, and preprocessor) and can be quickly used for inference or training.
|
|||||
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
|||||
Audience
Developers, researchers, and engineers wanting a tool to accurately parse and understand complex documents, layouts, and visual-text content at scale
|
Audience
Machine learning practitioners looking for a tool to train and deploy state-of-the-art models across NLP, vision, and audio tasks
|
|||||
Support
Phone Support
24/7 Live Support
Online
|
Support
Phone Support
24/7 Live Support
Online
|
|||||
API
Offers API
|
API
Offers API
|
|||||
Screenshots and Videos |
Screenshots and Videos |
|||||
Pricing
Free
Free Version
Free Trial
|
Pricing
$9 per month
Free Version
Free Trial
|
|||||
Reviews/
|
Reviews/
|
|||||
Training
Documentation
Webinars
Live Online
In Person
|
Training
Documentation
Webinars
Live Online
In Person
|
|||||
Company InformationZ.ai
Founded: 2019
China
github.com/zai-org/GLM-OCR
|
Company InformationHugging Face
Founded: 2016
United States
huggingface.co/docs/transformers/en/index
|
|||||
Alternatives |
Alternatives |
|||||
|
|
|
|||||
|
|
|
|||||
|
|
||||||
|
|
|
|||||
Categories |
Categories |
|||||
Integrations
Hugging Face
PyTorch
|
||||||
|
|
|