OCR expert VLM powered by Hunyuan's native multimodal architecture
Agentic, Reasoning, and Coding (ARC) foundation models
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Language modeling in a sentence representation space
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
GLM-4 series: Open Multilingual Multimodal Chat LMs
Fast stable diffusion on CPU and AI PC
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Qwen2.5-VL is the multimodal large language model series
Advancing Open-source World Models
Official repository for LTX-Video
Reference PyTorch implementation and models for DINOv3
From Images to High-Fidelity 3D Assets
Contexts Optical Compression
Open image model at the forefront of design
ChatGPT interface with better UI
Open-source multi-speaker long-form text-to-speech model
LTX-Video Support for ComfyUI
Python bindings for llama.cpp
Multimodal Diffusion with Representation Alignment
The official repo of Qwen chat & pretrained large language model
State-of-the-art TTS model under 25MB
CodeGeeX2: A More Powerful Multilingual Code Generation Model