GLM-OCRZ.ai
|
KarloKakao Brain
|
|||||
Related Products
|
||||||
About
GLM-OCR is a multimodal optical character recognition model and open source repository that provides accurate, efficient, and comprehensive document understanding by combining text and visual modalities into a unified encoder–decoder architecture derived from the GLM-V family. Built with a visual encoder pre-trained on large-scale image–text data and a lightweight cross-modal connector feeding into a GLM-0.5B language decoder, the model supports layout detection, parallel region recognition, and structured output for text, tables, formulas, and complicated real-world document formats. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization, achieving state-of-the-art benchmarks on major document understanding tasks.
|
About
Karlo stands as a groundbreaking model for generating images based on text prompts. It builds upon OpenAI's remarkable unCLIP architecture but takes a step further by enhancing the standard super-resolution model, allowing it to recover intricate details at a remarkable resolution of 256px, all while minimizing noise through a limited number of denoising steps.
To create Karlo, we embarked on an extensive training process. We started from scratch, utilizing a vast dataset of 115 million image-text pairs, which included COYO-100M, CC3M, and CC12M. In the case of the Prior and Decoder components, we harnessed the power of ViT-L/14, a text encoder from OpenAI's CLIP repository. To optimize efficiency, we made a significant modification to the original unCLIP implementation. Instead of employing a trainable transformer in the decoder, we integrated the text encoder from ViT-L/14.
|
|||||
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
|||||
Audience
Developers, researchers, and engineers wanting a tool to accurately parse and understand complex documents, layouts, and visual-text content at scale
|
Audience
AI developers interested in an image generation model
|
|||||
Support
Phone Support
24/7 Live Support
Online
|
Support
Phone Support
24/7 Live Support
Online
|
|||||
API
Offers API
|
API
Offers API
|
|||||
Screenshots and Videos |
Screenshots and Videos |
|||||
Pricing
Free
Free Version
Free Trial
|
Pricing
Free
Free Version
Free Trial
|
|||||
Reviews/
|
Reviews/
|
|||||
Training
Documentation
Webinars
Live Online
In Person
|
Training
Documentation
Webinars
Live Online
In Person
|
|||||
Company InformationZ.ai
Founded: 2019
China
github.com/zai-org/GLM-OCR
|
Company InformationKakao Brain
Founded: 2017
South Korea
github.com/kakaobrain/karlo
|
|||||
Alternatives |
Alternatives |
|||||
|
|
||||||
|
|
|
|||||
|
|
|
|||||
|
|
|
|||||
Categories |
Categories |
|||||
Integrations
B^ DISCOVER
B^ EDIT
|
||||||
|
|
|