text and image to video generation: CogVideoX (2024) and CogVideo
Open Source Differentiable Computer Vision Library
All-in-one WebUI for AI generative image and video creation
The most powerful and modular diffusion model GUI, api and backend
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Label Studio is a multi-type data labeling and annotation tool
Easily turn large sets of image urls to an image dataset
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Train machine learning models within Docker containers
Director, Screenwriter, Producer, and Video Generator All-in-One
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Stable Diffusion web UI
Ready-to-use OCR with 80+ supported languages
A Customizable Image-to-Video Model based on HunyuanVideo
Models for object and human mesh reconstruction
Awesome multilingual OCR toolkits based on PaddlePaddle
A SOTA open-source image editing model
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Implementation of Imagen, Google's Text-to-Image Neural Network
An open source implementation of CLIP
State-of-the-art diffusion models for image and audio generation
Flexible Photo Recrafting While Preserving Your Identity
A neural network that transforms a design mock-up into static websites
Chat & pretrained large vision language model
CLIP, Predict the most relevant text snippet given an image