Lets make video diffusion practical
An unsupervised and free tool for image and video dataset analysis
HunyuanVideo: A Systematic Framework For Large Video Generation Model
ComfyUI wrapper nodes for WanVideo and related models
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
A general fine-tuning kit geared toward image/video/audio diffusion
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Label Studio is a multi-type data labeling and annotation tool
Implementation of a U-net complete with efficient attention
Recovering the Visual Space from Any Views
The most powerful and modular diffusion model GUI, api and backend
InvokeAI is a leading creative engine for Stable Diffusion models
We write your reusable computer vision tools
Generating Immersive, Explorable, and Interactive 3D Worlds
Sharp Monocular Metric Depth in Less Than a Second
Dealing with all unstructured data, such as reverse image search
Qwen3-omni is a natively end-to-end, omni-modal LLM
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
A Pioneering Open-Source Alternative to GPT-4o
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Advancing Open-source World Models
A Telegram bot that integrates with OpenAI's official ChatGPT APIs
21 Lessons, Get Started Building with Generative AI
The data structure for multimodal data
Effortless data labeling with AI support from Segment Anything