This repo contains the code for 1D tokenizer and generator
Models for object and human mesh reconstruction
Towards Real-World Vision-Language Understanding
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Lets make video diffusion practical
Implementation of a U-net complete with efficient attention
Gracefully face hCaptcha challenge with multimodal llms
High-Resolution Image Synthesis with Latent Diffusion Models
Offline inference engine for art, real-time voice conversations
A Pioneering Open-Source Alternative to GPT-4o
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Code for running inference with the SAM 3D Body Model 3DB
Official MiniMax Model Context Protocol (MCP) server
Official implementation of Watermark Anything with Localized Messages
AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
AutoGluon: AutoML for Image, Text, and Tabular Data
Stable Diffusion with Core ML on Apple Silicon
Python Optimal Transport
Fast image augmentation library and an easy-to-use wrapper
Capable of understanding text, audio, vision, video
Diffusion Transformer with Fine-Grained Chinese Understanding
Easy Docker setup for Stable Diffusion with user-friendly UI
Make drawing and labeling bounding boxes easy as cake