Contexts Optical Compression
Code for running inference and finetuning with SAM 3 model
State-of-the-art TTS model under 25MB
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
A Powerful Native Multimodal Model for Image Generation
Dataset of GPT-2 outputs for research in detection, biases, and more
Official inference repo for FLUX.2 models
Generating Immersive, Explorable, and Interactive 3D Worlds
Qwen-Image is a powerful image generation foundation model
CLIP, Predict the most relevant text snippet given an image
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Qwen3-omni is a natively end-to-end, omni-modal LLM
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Chat & pretrained large vision language model
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Chat & pretrained large audio language model proposed by Alibaba Cloud
Qwen3 is the large language model series developed by Qwen team
Capable of understanding text, audio, vision, video
Generate Any 3D Scene in Seconds
Industrial-level controllable zero-shot text-to-speech system
Designed for text embedding and ranking tasks
Diffusion Transformer with Fine-Grained Chinese Understanding