Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and even video question answering. Vidi targets applications like intelligent video editing, automated video search, content analysis, and editing assistance, enabling users to efficiently locate relevant segments and objects in hours-long footage. The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.

Features

  • Multimodal video understanding: processes video + audio + possibly metadata/text to answer complex queries
  • Temporal retrieval: identifies time ranges in long videos corresponding to given text queries
  • Spatio-temporal grounding: finds bounding boxes of target objects across time when relevant
  • Video question answering: supports QA over video content rather than only retrieval or segmentation
  • Open-source release with model code, inference scripts, and evaluation pipelines — reproducible research and integration-friendly
  • Designed for long-context videos — capable of handling extended footage instead of only short clips

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Vidi2

Vidi2 Web Site

Other Useful Business Software
Forever Free Full-Stack Observability | Grafana Cloud Icon
Forever Free Full-Stack Observability | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Create free account
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Vidi2!

Additional Project Details

Programming Language

Python

Related Categories

Python AI Video Generators, Python AI Models

Registered

2025-12-01