Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch
Contexts Optical Compression
PS2 Covers Collection
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Reference PyTorch implementation and models for DINOv3
Windrecorder is a memory search app by records everything
Weaving the Digital Agent Galaxy
The most powerful Android RPA agent framework
Wan2.1: Open and Advanced Large-Scale Video Generative Model
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Lets make video diffusion practical
The library to build & auto-optimize LLM applications
Videomass is a free, open source and cross-platform GUI for FFmpeg
Create beautiful slides on the web using Claude's frontend skills
Detects phishing and lookalike domains using DNS fuzzing techniques
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Foundation model for image generation
Generating Immersive, Explorable, and Interactive 3D Worlds
Benchmarking Multimodal Agents for Open-Ended Tasks
Generate audiobooks from e-books
Agent S: an open agentic framework that uses computers like a human
A Python toolbox for gaining geometric insights
Entity Relation Diagrams generation tool