This repo contains the code for 1D tokenizer and generator
A framework to enable multimodal models to operate a computer
Witness the aha moment of VLM with less than $3
LTX-Video Support for ComfyUI
The most powerful Android RPA agent framework
An open phone agent model & framework
Director, Screenwriter, Producer, and Video Generator All-in-One
Unified Multimodal Understanding and Generation Models
Reference PyTorch implementation and models for DINOv3
"VideoRAG: Chat with Your Videos
Generating Immersive, Explorable, and Interactive 3D Worlds
The library to build & auto-optimize LLM applications
Expressive Portrait Image Animation for Live Streaming
SAPIEN Manipulation Skill Framework
Python inference and LoRA trainer package for the LTX-2 audio–video
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official implementation of Watermark Anything with Localized Messages
CogView4, CogView3-Plus and CogView3(ECCV 2024)
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Virtual AI anchor that combines state-of-the-art technology
Static Analyzer for Solidity
Taming Stable Diffusion for Lip Sync
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Motion-controllable Video Generation via Latent Trajectory Guidance
PDF to Markdown with vision models