Wan2.2: Open and Advanced Large-Scale Video Generative Model
Qwen3 is the large language model series developed by Qwen team
A Powerful Native Multimodal Model for Image Generation
Qwen3-Coder is the code version of Qwen3
Generating Immersive, Explorable, and Interactive 3D Worlds
Qwen-Image is a powerful image generation foundation model
State-of-the-art TTS model under 25MB
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Pushing the Limits of Mathematical Reasoning in Open Language Models
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Qwen3-omni is a natively end-to-end, omni-modal LLM
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
FAIR Sequence Modeling Toolkit 2
Renderer for the harmony response format to be used with gpt-oss
Multimodal-Driven Architecture for Customized Video Generation
Capable of understanding text, audio, vision, video
Phi-3.5 for Mac: Locally-run Vision and Language Models
Repo of Qwen2-Audio chat & pretrained large audio language model
Qwen2.5-VL is the multimodal large language model series
Language modeling in a sentence representation space
CLIP, Predict the most relevant text snippet given an image
Multimodal Diffusion with Representation Alignment
Chat & pretrained large vision language model
A Unified Framework for Text-to-3D and Image-to-3D Generation