Diversity-driven optimization and large-model reasoning ability
MOSS‑TTS Family open‑source speech and sound generation model
Recovering the Visual Space from Any Views
Pokee Deep Research Model Open Source Repo
A Powerful Native Multimodal Model for Image Generation
Easy Docker setup for Stable Diffusion with user-friendly UI
Models for object and human mesh reconstruction
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Advancing Open-source World Models
DeepSeek Coder: Let the Code Write Itself
Programmatic access to the AlphaGenome model
Miso TTS is an 8 billion, highly emotive text-to-speech model
Qwen-Image is a powerful image generation foundation model
Open-source image generative foundation model
Multimodal Diffusion with Representation Alignment
Personalize Any Characters with a Scalable Diffusion Transformer
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
An experimental version of DeepSeek model
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
A series of math-specific large language models of our Qwen2 series
Inference script for Oasis 500M
A Systematic Framework for Interactive World Modeling
Long-form streaming TTS system for multi-speaker dialogue generation
Research code artifacts for Code World Model (CWM)