ChatGPT interface with better UI
Qwen2.5-VL is the multimodal large language model series
Repo of Qwen2-Audio chat & pretrained large audio language model
Project Lyra: Open Generative 3D World Models
HY-Motion model for 3D character animation generation
4M: Massively Multimodal Masked Modeling
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
A Customizable Image-to-Video Model based on HunyuanVideo
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A trainable PyTorch reproduction of AlphaFold 3
Audio foundation model excelling in audio understanding
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
A Multi-Modal World Model for Reconstructing, Generating, Simulation
A Systematic Framework for Interactive World Modeling
code for Mesh R-CNN, ICCV 2019
An AI-powered security review GitHub Action using Claude
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Tiny vision language model
The official PyTorch implementation of Google's Gemma models
Inference code for scalable emulation of protein equilibrium ensembles
Programmatic access to the AlphaGenome model
Qwen-Image is a powerful image generation foundation model