Wan2.2 is a major upgrade to the Wan series of open and advanced large-scale video generative models, incorporating cutting-edge innovations to boost video generation quality and efficiency. It introduces a Mixture-of-Experts (MoE) architecture that splits the denoising process across specialized expert models, increasing total model capacity without raising computational costs. Wan2.2 integrates meticulously curated cinematic aesthetic data, enabling precise control over lighting, composition, color tone, and more, for high-quality, customizable video styles. The model is trained on significantly larger datasets than its predecessor, greatly enhancing motion complexity, semantic understanding, and aesthetic diversity. Wan2.2 also open-sources a 5-billion parameter high-compression VAE-based hybrid text-image-to-video (TI2V) model that supports 720P video generation at 24fps on consumer-grade GPUs like the RTX 4090. It supports multiple video generation tasks including text-to-video.
Features
- Mixture-of-Experts (MoE) architecture splitting denoising into high-noise and low-noise experts
- Incorporates cinematic-level aesthetic data for fine-grained control of video style and quality
- Trained on 65.6% more images and 83.2% more videos than Wan2.1 for richer motion and semantics
- High-compression 5B TI2V model with advanced VAE achieving up to 64× compression ratio
- Supports text-to-video, image-to-video, and text-image-to-video generation in one framework
- Generates 720P videos at 24fps efficiently on consumer GPUs like RTX 4090
- Compatible with multi-GPU inference acceleration via PyTorch FSDP and DeepSpeed Ulysses
- Open-source with inference code, checkpoints, and integration with popular UI frameworks like ComfyUI and Diffusers
License
Apache License V2.0Follow Wan2.2
User Reviews
-
Excellent open source video generation model