PyTorch code and models for the DINOv2 self-supervised learning
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Pushing the Limits of Mathematical Reasoning in Open Language Models
Global weather forecasting model using graph neural networks and JAX
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Renderer for the harmony response format to be used with gpt-oss
Release for Improved Denoising Diffusion Probabilistic Models
Long-form streaming TTS system for multi-speaker dialogue generation
Open-source industrial-grade ASR models
Hunyuan Translation Model Version 1.5
CLIP, Predict the most relevant text snippet given an image
Multimodal Diffusion with Representation Alignment
Pretrained time-series foundation model developed by Google Research
General-purpose image editing model that delivers high-fidelity
Inference script for Oasis 500M
Generate Any 3D Scene in Seconds
Fast and Universal 3D reconstruction model for versatile tasks
Hackable and optimized Transformers building blocks
Unified Multimodal Understanding and Generation Models
code for Mesh R-CNN, ICCV 2019
Provides convenient access to the Anthropic REST API from any Python 3
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Inference framework for 1-bit LLMs
A state-of-the-art open visual language model