GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
OCR expert VLM powered by Hunyuan's native multimodal architecture
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
HY-Motion model for 3D character animation generation
Repo of Qwen2-Audio chat & pretrained large audio language model
High-Fidelity and Controllable Generation of Textured 3D Assets
This repository contains the official implementation of FastVLM
GPT4V-level open-source multi-modal model based on Llama3-8B
Renderer for the harmony response format to be used with gpt-oss
A trainable PyTorch reproduction of AlphaFold 3
Advancing Open-source World Models
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Qwen2.5-VL is the multimodal large language model series
4M: Massively Multimodal Masked Modeling
Official implementation of DreamCraft3D
Qwen3-ASR is an open-source series of ASR models
Recovering the Visual Space from Any Views
Capable of understanding text, audio, vision, video
Tiny vision language model
Easy Docker setup for Stable Diffusion with user-friendly UI
Inference code for scalable emulation of protein equilibrium ensembles
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Towards Ultimate Expert Specialization in Mixture-of-Experts Language