Contexts Optical Compression
Code for running inference and finetuning with SAM 3 model
Official inference repo for FLUX.2 models
Wan2.2: Open and Advanced Large-Scale Video Generative Model
State-of-the-art TTS model under 25MB
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Generating Immersive, Explorable, and Interactive 3D Worlds
A Powerful Native Multimodal Model for Image Generation
CLIP, Predict the most relevant text snippet given an image
Qwen3-omni is a natively end-to-end, omni-modal LLM
Qwen-Image is a powerful image generation foundation model
Dataset of GPT-2 outputs for research in detection, biases, and more
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Chat & pretrained large audio language model proposed by Alibaba Cloud
GLM-4-Voice | End-to-End Chinese-English Conversational Model
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Chat & pretrained large vision language model
Industrial-level controllable zero-shot text-to-speech system
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Capable of understanding text, audio, vision, video
Generate Any 3D Scene in Seconds
Qwen3 is the large language model series developed by Qwen team
Towards Real-World Vision-Language Understanding
Large-language-model & vision-language-model based on Linear Attention