Vidi2
Large Multimodal Models for Video Understanding and Editing
...— offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and even video question answering. Vidi targets applications like intelligent video editing, automated video search, content analysis, and editing assistance, enabling users to efficiently locate relevant segments and objects in hours-long footage. The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.