Awesome multilingual OCR toolkits based on PaddlePaddle
The most powerful and modular diffusion model GUI, api and backend
Code for running inference with the SAM 3D Body Model 3DB
Generating Immersive, Explorable, and Interactive 3D Worlds
CogView4, CogView3-Plus and CogView3(ECCV 2024)
A neural network that transforms a design mock-up into static websites
Models for object and human mesh reconstruction
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Guiding Instruction-based Image Editing via Multimodal Large Language
CLIP, Predict the most relevant text snippet given an image
Fast image augmentation library and an easy-to-use wrapper
Open Source Differentiable Computer Vision Library
A Customizable Image-to-Video Model based on HunyuanVideo
Towards Real-World Vision-Language Understanding
text and image to video generation: CogVideoX (2024) and CogVideo
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
RGBD video generation model conditioned on camera input
Python Optimal Transport
21 Lessons, Get Started Building with Generative AI
An unsupervised and free tool for image and video dataset analysis
Implementation of a U-net complete with efficient attention
An open source object detection toolbox based on PyTorch
3D reconstruction software
Diffusion Transformer with Fine-Grained Chinese Understanding