Lets make video diffusion practical
This repo contains the code for 1D tokenizer and generator
A lightweight vision library for performing large object detection
Sharp Monocular Metric Depth in Less Than a Second
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
MII makes low-latency and high-throughput inference possible
A Customizable Image-to-Video Model based on HunyuanVideo
Overcoming Data Limitations for High-Quality Video Diffusion Models
An open-source framework for training large multimodal models
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion
Based on the Disco Diffusion, version of the AI art creation software
A Unified Toolkit for Deep Learning Based Document Image Analysis
Implementation of Deep Feature Rotation for Multimodal Image
Punctuation restoration production-ready model for Russian language
Composable GAN framework with api and user interface
Compute FID scores with PyTorch
Python library for model interpretation/explanations
A large annotated semantic parsing corpus for developing NL interfaces