Hackable and optimized Transformers building blocks
Text and image to video generation: CogVideoX and CogVideo
Qwen3 is the large language model series developed by Qwen team
Unified Multimodal Understanding and Generation Models
A Powerful Native Multimodal Model for Image Generation
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Foundation model for image generation
Designed for text embedding and ranking tasks
RGBD video generation model conditioned on camera input
A Systematic Framework for Interactive World Modeling
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Open-source industrial-grade ASR models
Large Multimodal Models for Video Understanding and Editing
VMZ: Model Zoo for Video Modeling
State-of-the-art (SoTA) text-to-video pre-trained model
Python SDK for Claude Agent
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
DeepSeek Coder: Let the Code Write Itself
A Production-ready Reinforcement Learning AI Agent Library
Bidirectional token-classification model for identifiable info
Visual Causal Flow
Models for object and human mesh reconstruction
Video Object and Interaction Deletion
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
ChatGLM-6B: An Open Bilingual Dialogue Language Model