GLM-4.1VZhipu AI
|
Pixtral LargeMistral AI
|
|||||
Related Products
|
||||||
About
GLM-4.1V is a vision-language model, providing a powerful, compact multimodal model designed for reasoning and perception across images, text, and documents. The 9-billion-parameter variant (GLM-4.1V-9B-Thinking) is built on the GLM-4-9B foundation and enhanced through a specialized training paradigm using Reinforcement Learning with Curriculum Sampling (RLCS). It supports a 64k-token context window and accepts high-resolution inputs (up to 4K images, any aspect ratio), enabling it to handle complex tasks such as optical character recognition, image captioning, chart and document parsing, video and scene understanding, GUI-agent workflows (e.g., interpreting screenshots, recognizing UI elements), and general vision-language reasoning. In benchmark evaluations at the 10 B-parameter scale, GLM-4.1V-9B-Thinking achieved top performance on 23 of 28 tasks.
|
About
Pixtral Large is a 124-billion-parameter open-weight multimodal model developed by Mistral AI, building upon their Mistral Large 2 architecture. It integrates a 123-billion-parameter multimodal decoder with a 1-billion-parameter vision encoder, enabling advanced understanding of documents, charts, and natural images while maintaining leading text comprehension capabilities. With a context window of 128,000 tokens, Pixtral Large can process at least 30 high-resolution images simultaneously. The model has demonstrated state-of-the-art performance on benchmarks such as MathVista, DocVQA, and VQAv2, surpassing models like GPT-4o and Gemini-1.5 Pro. Pixtral Large is available under the Mistral Research License for research and educational use, and under the Mistral Commercial License for commercial applications.
|
|||||
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
|||||
Audience
Developers and AI researchers seeking a solution offering a vision-language model that balances size and capability, ideal for building multimodal agents, document/image analysis tools, or GUI-based automation workflows
|
Audience
AI developers interested in a powerful multimodal model
|
|||||
Support
Phone Support
24/7 Live Support
Online
|
Support
Phone Support
24/7 Live Support
Online
|
|||||
API
Offers API
|
API
Offers API
|
|||||
Screenshots and Videos |
Screenshots and VideosNo images available
|
|||||
Pricing
Free
Free Version
Free Trial
|
Pricing
Free
Free Version
Free Trial
|
|||||
Reviews/
|
Reviews/
|
|||||
Training
Documentation
Webinars
Live Online
In Person
|
Training
Documentation
Webinars
Live Online
In Person
|
|||||
Company InformationZhipu AI
Founded: 2023
China
chat.z.ai/
|
Company InformationMistral AI
Founded: 2023
France
mistral.ai/news/pixtral-large/
|
|||||
Alternatives |
Alternatives |
|||||
|
|
|
|||||
|
|
|
|||||
|
|
|
|||||
|
|
|
|||||
Categories |
Categories |
|||||
Integrations
Amazon Bedrock
EvalsOne
Expanse
Kilo Code
LM-Kit.NET
Lewis
Lunary
Microsoft Foundry Agent Service
Motific.ai
NexalAI
|
Integrations
Amazon Bedrock
EvalsOne
Expanse
Kilo Code
LM-Kit.NET
Lewis
Lunary
Microsoft Foundry Agent Service
Motific.ai
NexalAI
|
|||||
|
|
|