Controllable & emotion-expressive zero-shot TTS
DeepSeek Coder: Let the Code Write Itself
Qwen3-ASR is an open-source series of ASR models
Foundation model for image generation
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal Diffusion with Representation Alignment
HY-Motion model for 3D character animation generation
Generate Any 3D Scene in Seconds
CogView4, CogView3-Plus and CogView3(ECCV 2024)
An Efficient Agentic Model for Computer Use
Audio foundation model excelling in audio understanding
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Qwen3-omni is a natively end-to-end, omni-modal LLM
Long-form streaming TTS system for multi-speaker dialogue generation
Fast-stable-diffusion + DreamBooth
Collection of Gemma 3 variants that are trained for performance
LTX-Video Support for ComfyUI
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Project Lyra: Open Generative 3D World Models
Inference script for Oasis 500M
Hackable and optimized Transformers building blocks
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Diffusion Transformer with Fine-Grained Chinese Understanding
A Customizable Image-to-Video Model based on HunyuanVideo