LTX-Video Support for ComfyUI
Towards Real-World Vision-Language Understanding
Tool for exploring and debugging transformer model behaviors
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Qwen3-ASR is an open-source series of ASR models
Qwen2.5-VL is the multimodal large language model series
Global weather forecasting model using graph neural networks and JAX
Sharp Monocular Metric Depth in Less Than a Second
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Contexts Optical Compression
Foundation model for image generation
VMZ: Model Zoo for Video Modeling
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A Systematic Framework for Interactive World Modeling
Industrial-level controllable zero-shot text-to-speech system
A state-of-the-art open visual language model
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Pushing the Limits of Mathematical Reasoning in Open Language Models
PyTorch code and models for the DINOv2 self-supervised learning
Official implementation of DreamCraft3D
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
The ChatGPT Retrieval Plugin lets you easily find personal documents
Revolutionizing Database Interactions with Private LLM Technology
GLM-4 series: Open Multilingual Multimodal Chat LMs
Block Diffusion for Ultra-Fast Speculative Decoding