Sa2VA is a cutting-edge open-source multi-modal large language model (MLLM) developed by ByteDance that unifies dense segmentation, visual understanding, and language-based reasoning across both images and videos. It merges the segmentation power of a state-of-the-art video segmentation model (based on SAM‑2) with the vision-language reasoning capabilities of a strong LLM backbone (derived from models like InternVL2.5 / Qwen-VL series), yielding a system that can answer questions about visual content, perform referring segmentation, and maintain temporal consistency across frames in video. With minimal instruction tuning (often one-shot), Sa2VA can handle tasks such as “segment the main subject,” “what are the objects in this scene?”, or “track this object through the video,” outputting pixel-perfect masks or spoken/textual answers as appropriate.

Features

  • Unified image/video + language understanding: supports both visual question-answering and dense segmentation on images and videos
  • Referring segmentation: given a natural-language prompt (like “segment the man in red jacket”), it outputs precise segmentation masks aligned with semantic intent
  • Video-level temporal consistency: maintains stable segmentation/tracking of objects across frames in a video, useful for video editing, object tracking, or temporal analysis
  • Multi-size model family (1B, 4B, 8B, 26B, etc.) to match different hardware/resource constraints or performance needs
  • Open-source with pretrained weights, demo code, inference scripts and evaluation tooling — ready to integrate or extend for custom applications
  • Combines segmentation (from SAM-2) with strong language understanding (from VLLM backbone), enabling complex, multi-modal tasks (e.g. description + segmentation + reasoning) in one model

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Sa2VA

Sa2VA Web Site

Other Useful Business Software
Earn up to 16% annual interest with Nexo. Icon
Earn up to 16% annual interest with Nexo.

Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
Get started with Nexo.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Sa2VA!

Additional Project Details

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

2025-12-01