GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model
LTX-Video Support for ComfyUI
General-purpose image editing model that delivers high-fidelity
State-of-the-art (SoTA) text-to-video pre-trained model
Sharp Monocular Metric Depth in Less Than a Second
A Customizable Image-to-Video Model based on HunyuanVideo
Diffusion Transformer with Fine-Grained Chinese Understanding
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
CLIP, Predict the most relevant text snippet given an image
ChatGPT interface with better UI
GLM-4 series: Open Multilingual Multimodal Chat LMs
Inference script for Oasis 500M
The official PyTorch implementation of Google's Gemma models
Qwen3-ASR is an open-source series of ASR models
Large Multimodal Models for Video Understanding and Editing
Chat & pretrained large vision language model
Pushing the Limits of Mathematical Reasoning in Open Language Models
Advancing Open-source World Models
Controllable & emotion-expressive zero-shot TTS
Inference code for scalable emulation of protein equilibrium ensembles
AlphaFold 3 inference pipeline
Unified Multimodal Understanding and Generation Models