Wan2.1: Open and Advanced Large-Scale Video Generative Model
Generating Immersive, Explorable, and Interactive 3D Worlds
A neural network that transforms a design mock-up into static websites
Chat & pretrained large vision language model
Guiding Instruction-based Image Editing via Multimodal Large Language
CLIP, Predict the most relevant text snippet given an image
Open Source Differentiable Computer Vision Library
text and image to video generation: CogVideoX (2024) and CogVideo
CogView4, CogView3-Plus and CogView3(ECCV 2024)
A Customizable Image-to-Video Model based on HunyuanVideo
21 Lessons, Get Started Building with Generative AI
Towards Real-World Vision-Language Understanding
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
RGBD video generation model conditioned on camera input
An unsupervised and free tool for image and video dataset analysis
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Fast image augmentation library and an easy-to-use wrapper
Implementation of a U-net complete with efficient attention
An open source object detection toolbox based on PyTorch
Diffusion Transformer with Fine-Grained Chinese Understanding
GPT4V-level open-source multi-modal model based on Llama3-8B
State-of-the-art diffusion models for image and audio generation
Official implementation of Watermark Anything with Localized Messages
OpenMMLab Model Deployment Framework