Qwen-Image is a powerful image generation foundation model
Fast-stable-diffusion + DreamBooth
A Customizable Image-to-Video Model based on HunyuanVideo
Official inference repo for FLUX.2 models
Wan2.1: Open and Advanced Large-Scale Video Generative Model
PyTorch implementation of JiT
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Reference PyTorch implementation and models for DINOv3
Text and image to video generation: CogVideoX and CogVideo
High-Resolution Image Synthesis with Latent Diffusion Models
Sharp Monocular Metric Depth in Less Than a Second
Diffusion Transformer with Fine-Grained Chinese Understanding
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Chinese and English multimodal conversational language model
Flux 2 image generation model pure C inference
Capable of understanding text, audio, vision, video
Advancing Open-source World Models
Easy Docker setup for Stable Diffusion with user-friendly UI
RGBD video generation model conditioned on camera input
A state-of-the-art open visual language model
A latent text-to-image diffusion model
Lightweight multimodal translation model for 55 languages
Small 3B-base multimodal model ideal for custom AI on edge hardware
Compact 8B multimodal instruct model optimized for edge deployment
Efficient 14B multimodal instruct model with edge deployment and FP8