Text and image to video generation: CogVideoX and CogVideo
Qwen-Image is a powerful image generation foundation model
Open-source multi-speaker long-form text-to-speech model
Qwen2.5-VL is the multimodal large language model series
Reference PyTorch implementation and models for DINOv3
DeepSeek Coder: Let the Code Write Itself
The official repo of Qwen chat & pretrained large language model
RGBD video generation model conditioned on camera input
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Advancing Open-source World Models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
High-Resolution Image Synthesis with Latent Diffusion Models
Foundation model for image generation
Official repository for LTX-Video
Lets make video diffusion practical
Visual Causal Flow
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
An experimental version of DeepSeek model
LTX-Video Support for ComfyUI
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Repo for SeedVR2 & SeedVR
A Powerful Native Multimodal Model for Image Generation
Hunyuan Translation Model Version 1.5
gpt-oss-120b and gpt-oss-20b are two open-weight language models