State-of-the-art (SoTA) text-to-video pre-trained model
Repo of Qwen2-Audio chat & pretrained large audio language model
Qwen-Image is a powerful image generation foundation model
HY-Motion model for 3D character animation generation
4M: Massively Multimodal Masked Modeling
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
A Customizable Image-to-Video Model based on HunyuanVideo
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Netease Youdao's open-source embedding and reranker models
Audio foundation model excelling in audio understanding
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
A Multi-Modal World Model for Reconstructing, Generating, Simulation
A Systematic Framework for Interactive World Modeling
code for Mesh R-CNN, ICCV 2019
An AI-powered security review GitHub Action using Claude
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Qwen3-omni is a natively end-to-end, omni-modal LLM
Tiny vision language model
The official PyTorch implementation of Google's Gemma models
Inference code for scalable emulation of protein equilibrium ensembles
Programmatic access to the AlphaGenome model
A SOTA open-source image editing model
A state-of-the-art open visual language model