Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch
PS2 Covers Collection
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Reference PyTorch implementation and models for DINOv3
Windrecorder is a memory search app by records everything
Weaving the Digital Agent Galaxy
The most powerful Android RPA agent framework
A modern library for 3D data processing
Wan2.1: Open and Advanced Large-Scale Video Generative Model
The library to build & auto-optimize LLM applications
Lets make video diffusion practical
Videomass is a free, open source and cross-platform GUI for FFmpeg
Create beautiful slides on the web using Claude's frontend skills
Detects phishing and lookalike domains using DNS fuzzing techniques
Foundation model for image generation
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Benchmarking Multimodal Agents for Open-Ended Tasks
Generating Immersive, Explorable, and Interactive 3D Worlds
A Python toolbox for gaining geometric insights
Entity Relation Diagrams generation tool
Generate audiobooks from e-books