4M: Massively Multimodal Masked Modeling
ICLR2024 Spotlight: curation/training code, metadata, distribution
[CVPR 2025 Best Paper Award] VGGT
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Official implementation of DreamCraft3D
TGMC: TerraGov Marine Corps, a SS13 mod
Python framework for adversarial attacks, and data augmentation
High-Fidelity and Controllable Generation of Textured 3D Assets
Open-source AI marketing skills for Claude Code
Unifying 3D Mesh Generation with Language Models
A personal context-agent that learns how you work
Controllable and fast Text-to-Speech for over 7000 languages
Unified Multimodal Understanding and Generation Models
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
PyTorch code and models for VJEPA2 self-supervised learning from video
Educational framework exploring multi-agent orchestration
A lightweight vision library for performing large object detection
Framework for managing and maintaining multi-language pre-commit hooks
This repo contains the code for 1D tokenizer and generator
Flexible Photo Recrafting While Preserving Your Identity
A SOTA open-source image editing model
Multi-Agent daTa geneRation Infra and eXperimentation framework
Build cross-modal and multimodal applications on the cloud
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning