This repo contains the code for 1D tokenizer and generator
A framework to enable multimodal models to operate a computer
Witness the aha moment of VLM with less than $3
An open phone agent model & framework
The most powerful Android RPA agent framework
LTX-Video Support for ComfyUI
Reference PyTorch implementation and models for DINOv3
Unified Multimodal Understanding and Generation Models
The library to build & auto-optimize LLM applications
Official implementation of Watermark Anything with Localized Messages
SAPIEN Manipulation Skill Framework
Generating Immersive, Explorable, and Interactive 3D Worlds
Python inference and LoRA trainer package for the LTX-2 audio–video
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
CogView4, CogView3-Plus and CogView3(ECCV 2024)
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Virtual AI anchor that combines state-of-the-art technology
Taming Stable Diffusion for Lip Sync
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
ICLR2024 Spotlight: curation/training code, metadata, distribution
[CVPR 2025 Best Paper Award] VGGT
Motion-controllable Video Generation via Latent Trajectory Guidance
PaddlePaddle End-to-End Development Toolkit
Modular quant framework
Gemma open-weight LLM library, from Google DeepMind