Text and image to video generation: CogVideoX and CogVideo
Awesome multilingual OCR toolkits based on PaddlePaddle
Qwen3-TTS is an open-source series of TTS models
A Unified Framework for Text-to-3D and Image-to-3D Generation
Official inference repo for FLUX.1 models
Diffusion Bee is the easiest way to run Stable Diffusion locally
A multimodal model for brain response prediction
Industrial-level controllable zero-shot text-to-speech system
Qwen-Image is a powerful image generation foundation model
Official Python inference and LoRA trainer package
Qwen3-ASR is an open-source series of ASR models
Qwen3 is the large language model series developed by Qwen team
Towards Real-World Vision-Language Understanding
The most powerful local music generation model
Open Source Speech Language Model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
General-purpose image editing model that delivers high-fidelity
Qwen3.5 is the large language model series developed by Qwen team
Visual Causal Flow
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
OCR expert VLM powered by Hunyuan's native multimodal architecture
HY-Motion model for 3D character animation generation
Pushing the Limits of Mathematical Reasoning in Open Language Models
Multimodal Diffusion with Representation Alignment
Audio foundation model excelling in audio understanding