Inference code for scalable emulation of protein equilibrium ensembles
Programmatic access to the AlphaGenome model
Lets make video diffusion practical
Tool for exploring and debugging transformer model behaviors
High-Resolution Image Synthesis with Latent Diffusion Models
Chat & pretrained large vision language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
GPT4V-level open-source multi-modal model based on Llama3-8B
OCR expert VLM powered by Hunyuan's native multimodal architecture
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Renderer for the harmony response format to be used with gpt-oss
Open-weight, large-scale hybrid-attention reasoning model
Qwen3-omni is a natively end-to-end, omni-modal LLM
Implementation of "MobileCLIP" CVPR 2024
Video understanding codebase from FAIR for reproducing video models
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
PyTorch code and models for the DINOv2 self-supervised learning
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Official implementation of DreamCraft3D
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Chinese and English multimodal conversational language model
Sharp Monocular Metric Depth in Less Than a Second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
MapAnything: Universal Feed-Forward Metric 3D Reconstruction