Stable Virtual Camera: Generative View Synthesis with Diffusion Models
An Efficient Agentic Model for Computer Use
Audio foundation model excelling in audio understanding
Controllable & emotion-expressive zero-shot TTS
DeepSeek Coder: Let the Code Write Itself
HY-Motion model for 3D character animation generation
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Qwen-Image is a powerful image generation foundation model
Long-form streaming TTS system for multi-speaker dialogue generation
Fast-stable-diffusion + DreamBooth
Collection of Gemma 3 variants that are trained for performance
LTX-Video Support for ComfyUI
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
The official PyTorch implementation of Google's Gemma models
Project Lyra: Open Generative 3D World Models
Inference script for Oasis 500M
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Diffusion Transformer with Fine-Grained Chinese Understanding
A Customizable Image-to-Video Model based on HunyuanVideo
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Pushing the Limits of Mathematical Reasoning in Open Language Models
Tiny vision language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning