GLM-OCRZ.ai
|
WhisperOpenAI
|
|||||
Related Products
|
||||||
About
GLM-OCR is a multimodal optical character recognition model and open source repository that provides accurate, efficient, and comprehensive document understanding by combining text and visual modalities into a unified encoder–decoder architecture derived from the GLM-V family. Built with a visual encoder pre-trained on large-scale image–text data and a lightweight cross-modal connector feeding into a GLM-0.5B language decoder, the model supports layout detection, parallel region recognition, and structured output for text, tables, formulas, and complicated real-world document formats. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization, achieving state-of-the-art benchmarks on major document understanding tasks.
|
About
We’ve trained and are open-sourcing a neural net called Whisper that approaches human-level robustness and accuracy in English speech recognition. Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise, and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing. The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder.
|
|||||
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
|||||
Audience
Developers, researchers, and engineers wanting a tool to accurately parse and understand complex documents, layouts, and visual-text content at scale
|
Audience
Anyone looking for a tool to recognize speech automatically and improve text transcription
|
|||||
Support
Phone Support
24/7 Live Support
Online
|
Support
Phone Support
24/7 Live Support
Online
|
|||||
API
Offers API
|
API
Offers API
|
|||||
Screenshots and Videos |
Screenshots and Videos |
|||||
Pricing
Free
Free Version
Free Trial
|
Pricing
No information available.
Free Version
Free Trial
|
|||||
Reviews/
|
Reviews/
|
|||||
Training
Documentation
Webinars
Live Online
In Person
|
Training
Documentation
Webinars
Live Online
In Person
|
|||||
Company InformationZ.ai
Founded: 2019
China
github.com/zai-org/GLM-OCR
|
Company InformationOpenAI
United States
openai.com/blog/whisper/
|
|||||
Alternatives |
Alternatives |
|||||
|
|
||||||
|
|
||||||
|
|
|
|||||
|
|
||||||
Categories |
Categories |
|||||
Integrations
AnotherWrapper
Baseten
Krater.ai
Kuku
MacWhisper
Pruna AI
ReByte
SheepScript.ai
Shownotes
Simplismart
|
Integrations
AnotherWrapper
Baseten
Krater.ai
Kuku
MacWhisper
Pruna AI
ReByte
SheepScript.ai
Shownotes
Simplismart
|
|||||
|
|
|