State-of-the-art Image & Video CLIP, Multimodal Large Language Models
A neural network that transforms a design mock-up into static websites
SAPIEN Manipulation Skill Framework
Generate audiobooks from e-books
PS2 Covers Collection
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Chinese and English multimodal conversational language model
ASCII art library for Python
Reference PyTorch implementation and models for DINOv3
Pushing the Frontier of Long Audio-Visual Generation
Visual simulation platform for space-based data backhaul scenarios
Suite of reference architectures for building GPU-accelerated vision
A full-featured, hackable tiling window manager written in Python
All-in-one AI productivity platform with agents, workflows, and IM
Pixel-Aligned 3D Generation from Images
A state-of-the-art open visual language model
Open-Source Python3 tool for recognizing layouts, tables, and math
Python inference and LoRA trainer package for the LTX-2 audio–video
A beautiful, powerful, self-hosted rom manager and player
No-code in the front, Python in the back. An open-source framework
Open-source evaluation toolkit of large multi-modality models (LMMs)
The most powerful Android RPA agent framework
Official implementation of Watermark Anything with Localized Messages
Multimodal Diffusion with Representation Alignment