Chinese and English multimodal conversational language model
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024
Video understanding codebase from FAIR for reproducing video models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Tooling for the Common Objects In 3D dataset
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
A series of math-specific large language models of our Qwen2 series
State-of-the-art (SoTA) text-to-video pre-trained model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Inference code for scalable emulation of protein equilibrium ensembles
PyTorch code and models for the DINOv2 self-supervised learning
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Official implementation of DreamCraft3D
A state-of-the-art open visual language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
The official PyTorch implementation of Google's Gemma models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
DeepMind model for tracking arbitrary points across videos & robotics
Sharp Monocular Metric Depth in Less Than a Second
code for Mesh R-CNN, ICCV 2019
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
LLM-based Reinforcement Learning audio edit model