Qwen2.5-VL is the multimodal large language model series
LTX-Video Support for ComfyUI
Multimodal embedding and reranking models built on Qwen3-VL
Official inference repo for FLUX.2 models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Python bindings for llama.cpp
Reference PyTorch implementation and models for DINOv3
Block Diffusion for Ultra-Fast Speculative Decoding
Qwen3-TTS is an open-source series of TTS models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Recovering the Visual Space from Any Views
Official repository for LTX-Video
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Accurate × Fast × Comprehensive
Diffusion Transformer with Fine-Grained Chinese Understanding
CLIP, Predict the most relevant text snippet given an image
ChatGPT interface with better UI
GLM-4 series: Open Multilingual Multimodal Chat LMs
Towards Real-World Vision-Language Understanding
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Open Source Speech Language Model
Hunyuan Translation Model Version 1.5
Qwen3-ASR is an open-source series of ASR models
Open-source deep-learning framework
A PyTorch library for implementing flow matching algorithms