Text and image to video generation: CogVideoX and CogVideo
Qwen3-TTS is an open-source series of TTS models
Awesome multilingual OCR toolkits based on PaddlePaddle
Official inference repo for FLUX.1 models
A Unified Framework for Text-to-3D and Image-to-3D Generation
Industrial-level controllable zero-shot text-to-speech system
Qwen-Image is a powerful image generation foundation model
Qwen3-ASR is an open-source series of ASR models
Official Python inference and LoRA trainer package
Towards Real-World Vision-Language Understanding
The most powerful local music generation model
Qwen3 is the large language model series developed by Qwen team
Open Source Speech Language Model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
HY-Motion model for 3D character animation generation
General-purpose image editing model that delivers high-fidelity
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Visual Causal Flow
Pushing the Limits of Mathematical Reasoning in Open Language Models
OCR expert VLM powered by Hunyuan's native multimodal architecture
Audio foundation model excelling in audio understanding
Multimodal Diffusion with Representation Alignment
ICLR2024 Spotlight: curation/training code, metadata, distribution
Official implementation of DreamCraft3D
Language modeling in a sentence representation space