Global weather forecasting model using graph neural networks and JAX
Renderer for the harmony response format to be used with gpt-oss
The Clay Foundation Model - An open source AI model and interface
Official implementation of Watermark Anything with Localized Messages
Towards Real-World Vision-Language Understanding
Tool for exploring and debugging transformer model behaviors
Multimodal-Driven Architecture for Customized Video Generation
Unified Multimodal Understanding and Generation Models
code for Mesh R-CNN, ICCV 2019
Dataset of GPT-2 outputs for research in detection, biases, and more
Code for running inference and finetuning with SAM 3 model
Models for object and human mesh reconstruction
Capable of understanding text, audio, vision, video
Inference code for scalable emulation of protein equilibrium ensembles
Fast and Universal 3D reconstruction model for versatile tasks
Diffusion Transformer with Fine-Grained Chinese Understanding
A Customizable Image-to-Video Model based on HunyuanVideo
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
CodeGeeX2: A More Powerful Multilingual Code Generation Model
The official PyTorch implementation of Google's Gemma models
Tooling for the Common Objects In 3D dataset
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
MapAnything: Universal Feed-Forward Metric 3D Reconstruction