Lets make video diffusion practical
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Block Diffusion for Ultra-Fast Speculative Decoding
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
RGBD video generation model conditioned on camera input
Stable Diffusion web UI
The most powerful local music generation model
InvokeAI is a leading creative engine for Stable Diffusion models
Autoregressive Model Beats Diffusion
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Diffusion Transformer with Fine-Grained Chinese Understanding
Open-source multi-speaker long-form text-to-speech model
Image inpainting tool powered by SOTA AI Model
Multimodal Diffusion with Representation Alignment
HY-Motion model for 3D character animation generation
A unified library of SOTA model optimization techniques
Repo for SeedVR2 & SeedVR
Run the Stable Diffusion releases in a Docker container
All-in-one WebUI for AI generative image and video creation
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Towards Human-Level Text-to-Speech through Style Diffusion
Cosmos-RL is a flexible and scalable Reinforcement Learning framework
Personalize Any Characters with a Scalable Diffusion Transformer
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Official Python inference and LoRA trainer package