GLM-OCR

GLM-OCR

Z.ai
Mu

Mu

Microsoft
+
+

Related Products

  • PackageX OCR Scanning
    46 Ratings
    Visit Website
  • LogicalDOC
    125 Ratings
    Visit Website
  • Nutrient SDK
    104 Ratings
    Visit Website
  • Square 9
    403 Ratings
    Visit Website
  • Apryse PDF SDK
    149 Ratings
    Visit Website
  • MyQ
    179 Ratings
    Visit Website
  • LM-Kit.NET
    23 Ratings
    Visit Website
  • onPhase
    216 Ratings
    Visit Website
  • Google AI Studio
    11 Ratings
    Visit Website
  • Google Cloud Speech-to-Text
    374 Ratings
    Visit Website

About

GLM-OCR is a multimodal optical character recognition model and open source repository that provides accurate, efficient, and comprehensive document understanding by combining text and visual modalities into a unified encoder–decoder architecture derived from the GLM-V family. Built with a visual encoder pre-trained on large-scale image–text data and a lightweight cross-modal connector feeding into a GLM-0.5B language decoder, the model supports layout detection, parallel region recognition, and structured output for text, tables, formulas, and complicated real-world document formats. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization, achieving state-of-the-art benchmarks on major document understanding tasks.

About

Mu is a 330-million-parameter encoder–decoder language model designed to power the agent in Windows settings by mapping natural-language queries to Settings function calls, running fully on-device via NPUs at over 100 tokens per second while maintaining high accuracy. Drawing on Phi Silica optimizations, Mu’s encoder–decoder architecture reuses a fixed-length latent representation to cut computation and memory overhead, yielding 47 percent lower first-token latency and 4.7× higher decoding speed on Qualcomm Hexagon NPUs compared to similar decoder-only models. Hardware-aware tuning, including a 2/3–1/3 encoder–decoder parameter split, weight sharing between input and output embeddings, Dual LayerNorm, rotary positional embeddings, and grouped-query attention, enables fast inference at over 200 tokens per second on devices like Surface Laptop 7 and sub-500 ms response times for settings queries.

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Audience

Developers, researchers, and engineers wanting a tool to accurately parse and understand complex documents, layouts, and visual-text content at scale

Audience

Developers seeking a solution to navigate and configure system settings through natural language

Support

Phone Support
24/7 Live Support
Online

Support

Phone Support
24/7 Live Support
Online

API

Offers API

API

Offers API

Screenshots and Videos

Screenshots and Videos

Pricing

Free
Free Version
Free Trial

Pricing

No information available.
Free Version
Free Trial

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Training

Documentation
Webinars
Live Online
In Person

Training

Documentation
Webinars
Live Online
In Person

Company Information

Z.ai
Founded: 2019
China
github.com/zai-org/GLM-OCR

Company Information

Microsoft
Founded: 1975
United States
blogs.windows.com/windowsexperience/2025/06/23/introducing-mu-language-model-and-how-it-enabled-the-agent-in-windows-settings/

Alternatives

CodeT5

CodeT5

Salesforce

Alternatives

CodeT5

CodeT5

Salesforce
HunyuanOCR

HunyuanOCR

Tencent
GLM-OCR

GLM-OCR

Z.ai
Mu

Mu

Microsoft
Whisper

Whisper

OpenAI
Mistral OCR 3

Mistral OCR 3

Mistral AI
Yi-Large

Yi-Large

01.AI

Categories

Categories

Integrations

No info available.

Integrations

No info available.
Claim GLM-OCR and update features and information
Claim GLM-OCR and update features and information
Claim Mu and update features and information
Claim Mu and update features and information