Wan2.2: Open and Advanced Large-Scale Video Generative Model
Qwen3 is the large language model series developed by Qwen team
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Qwen-Image is a powerful image generation foundation model
Generating Immersive, Explorable, and Interactive 3D Worlds
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Chat & pretrained large audio language model proposed by Alibaba Cloud
Capable of understanding text, audio, vision, video
Designed for text embedding and ranking tasks
Qwen3-omni is a natively end-to-end, omni-modal LLM
Chat & pretrained large vision language model
Repo of Qwen2-Audio chat & pretrained large audio language model
State-of-the-art TTS model under 25MB
Qwen2.5-VL is the multimodal large language model series
The official repo of Qwen chat & pretrained large language model
A Powerful Native Multimodal Model for Image Generation
Diffusion Transformer with Fine-Grained Chinese Understanding
Multimodal Diffusion with Representation Alignment
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Phi-3.5 for Mac: Locally-run Vision and Language Models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Chinese LLaMA & Alpaca large language model + local CPU/GPU training
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)