LTX-Video Support for ComfyUI
Towards Real-World Vision-Language Understanding
Tool for exploring and debugging transformer model behaviors
A Unified Framework for Text-to-3D and Image-to-3D Generation
Contexts Optical Compression
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Industrial-level controllable zero-shot text-to-speech system
VMZ: Model Zoo for Video Modeling
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
CodeGeeX2: A More Powerful Multilingual Code Generation Model
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Collection of Gemma 3 variants that are trained for performance
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A state-of-the-art open visual language model
Open-source deep-learning framework
Sharp Monocular Metric Depth in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Tongyi Deep Research, the Leading Open-source Deep Research Agent
PyTorch code and models for the DINOv2 self-supervised learning
Official implementation of DreamCraft3D
General-purpose image editing model that delivers high-fidelity
OCR expert VLM powered by Hunyuan's native multimodal architecture
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Global weather forecasting model using graph neural networks and JAX
The ChatGPT Retrieval Plugin lets you easily find personal documents