Controllable & emotion-expressive zero-shot TTS
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
OCR expert VLM powered by Hunyuan's native multimodal architecture
A SOTA open-source image editing model
Chinese and English multimodal conversational language model
Repo of Qwen2-Audio chat & pretrained large audio language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Qwen3-omni is a natively end-to-end, omni-modal LLM
Multi-modal large language model designed for audio understanding
Open-source framework for intelligent speech interaction
Open-weight, large-scale hybrid-attention reasoning model
Large-language-model & vision-language-model based on Linear Attention
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Capable of understanding text, audio, vision, video
FlashMLA: Efficient Multi-head Latent Attention Kernels
A state-of-the-art open visual language model
High-Resolution Image Synthesis with Latent Diffusion Models
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Python example app from the OpenAI API quickstart tutorial