Self-supervised visual learning using momentum contrast in PyTorch
CogView4, CogView3-Plus and CogView3(ECCV 2024)
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A full-featured, hackable tiling window manager written in Python
Reference PyTorch implementation and models for DINOv3
Lets make video diffusion practical
Multimodal Diffusion with Representation Alignment
Taming Stable Diffusion for Lip Sync
A neural network that transforms a design mock-up into static websites
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Open-Source Python3 tool for recognizing layouts, tables, and math
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Chinese and English multimodal conversational language model
3D Engine with Blender Integration
Azure command-line interface
Python inference and LoRA trainer package for the LTX-2 audio–video
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Powerful framework for controlling Android and iOS devices
The most powerful Android RPA agent framework
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
A cross-platform GUI wrapper for yt-dlp written in PySide6
Qwen3-omni is a natively end-to-end, omni-modal LLM
Label Studio is a multi-type data labeling and annotation tool
Open source feature flagging and remote config service