Contexts Optical Compression
Wan2.2: Open and Advanced Large-Scale Video Generative Model
State-of-the-art TTS model under 25MB
A Powerful Native Multimodal Model for Image Generation
Dataset of GPT-2 outputs for research in detection, biases, and more
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Code for running inference and finetuning with SAM 3 model
Generating Immersive, Explorable, and Interactive 3D Worlds
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Qwen-Image is a powerful image generation foundation model
CLIP, Predict the most relevant text snippet given an image
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Capable of understanding text, audio, vision, video
Chat & pretrained large vision language model
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Qwen3-omni is a natively end-to-end, omni-modal LLM
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Chat & pretrained large audio language model proposed by Alibaba Cloud
Diffusion Transformer with Fine-Grained Chinese Understanding
Qwen3 is the large language model series developed by Qwen team
High-Resolution Image Synthesis with Latent Diffusion Models
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Towards Real-World Vision-Language Understanding
Qwen2.5-VL is the multimodal large language model series