Official DeiT repository
PyTorch code and models for the DINOv2 self-supervised learning
Memory-efficient and performant finetuning of Mistral's models
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Official implementation of DreamCraft3D
Qwen2.5-VL is the multimodal large language model series
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
DeepMind model for tracking arbitrary points across videos & robotics
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Inference framework for 1-bit LLMs
Qwen3-omni is a natively end-to-end, omni-modal LLM
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Unified Multimodal Understanding and Generation Models
Sharp Monocular Metric Depth in Less Than a Second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Language modeling in a sentence representation space
An AI-powered security review GitHub Action using Claude
The ChatGPT Retrieval Plugin lets you easily find personal documents
Implementation of the Surya Foundation Model for Heliophysics
High-Resolution Image Synthesis with Latent Diffusion Models
A Conversational Speech Generation Model
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Open-source, high-performance Mixture-of-Experts large language model
Suite with Real-ESRGAN, BSRGAN , IRCNN, GFPGAN & RIFE. v4.3