Audience
AI and LLM developers
About Ferret
An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response.
Ferret Model - Hybrid Region Representation + Spatial-aware Visual Sampler enable fine-grained and open-vocabulary referring and grounding in MLLM.
GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust ground-and-refer instruction tuning dataset.
Ferret-Bench - A multimodal evaluation benchmark that jointly requires Referring/Grounding, Semantics, Knowledge, and Reasoning.
Other Popular Alternatives & Related Software
Qwen2.5-Max
Qwen2.5-Max is a large-scale Mixture-of-Experts (MoE) model developed by the Qwen team, pretrained on over 20 trillion tokens and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). In evaluations, it outperforms models like DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while also demonstrating competitive results in other assessments, including MMLU-Pro. Qwen2.5-Max is accessible via API through Alibaba Cloud and can be explored interactively on Qwen Chat.
Learn more
GLM-4.5V
GLM-4.5V builds on the GLM-4.5-Air foundation, using a Mixture-of-Experts (MoE) architecture with 106 billion total parameters and 12 billion activation parameters. It achieves state-of-the-art performance among open-source VLMs of similar scale across 42 public benchmarks, excelling in image, video, document, and GUI-based tasks. It supports a broad range of multimodal capabilities, including image reasoning (scene understanding, spatial recognition, multi-image analysis), video understanding (segmentation, event recognition), complex chart and long-document parsing, GUI-agent workflows (screen reading, icon recognition, desktop automation), and precise visual grounding (e.g., locating objects and returning bounding boxes). GLM-4.5V also introduces a “Thinking Mode” switch, allowing users to choose between fast responses or deeper reasoning when needed.
Learn more
Llama 4 Maverick
Llama 4 Maverick is one of the most advanced multimodal AI models from Meta, featuring 17 billion active parameters and 128 experts. It surpasses its competitors like GPT-4o and Gemini 2.0 Flash in a broad range of benchmarks, especially in tasks related to coding, reasoning, and multilingual capabilities. Llama 4 Maverick combines image and text understanding, enabling it to deliver industry-leading results in image-grounding tasks and precise, high-quality output. With its efficient performance at a reduced parameter size, Maverick offers exceptional value, especially in general assistant and chat applications.
Learn more
LLaVA
LLaVA (Large Language-and-Vision Assistant) is an innovative multimodal model that integrates a vision encoder with the Vicuna language model to facilitate comprehensive visual and language understanding. Through end-to-end training, LLaVA exhibits impressive chat capabilities, emulating the multimodal functionalities of models like GPT-4. Notably, LLaVA-1.5 has achieved state-of-the-art performance across 11 benchmarks, utilizing publicly available data and completing training in approximately one day on a single 8-A100 node, surpassing methods that rely on billion-scale datasets. The development of LLaVA involved the creation of a multimodal instruction-following dataset, generated using language-only GPT-4. This dataset comprises 158,000 unique language-image instruction-following samples, including conversations, detailed descriptions, and complex reasoning tasks. This data has been instrumental in training LLaVA to perform a wide array of visual and language tasks effectively.
Learn more
Pricing
Starting Price:
Free
Pricing Details:
Open source
Free Version:
Free Version available.
Integrations
No integrations listed.
Company Information
Apple
Founded: 1976
United States
github.com/apple/ml-ferret
Other Useful Business Software
AI-generated apps that pass security review
Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
Product Details
Platforms Supported
Cloud
Training
Documentation