DeepSeek Coder: Let the Code Write Itself
Pushing the Limits of Mathematical Reasoning in Open Language Models
A Customizable Image-to-Video Model based on HunyuanVideo
Renderer for the harmony response format to be used with gpt-oss
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Pokee Deep Research Model Open Source Repo
Diffusion Transformer with Fine-Grained Chinese Understanding
Phi-3.5 for Mac: Locally-run Vision and Language Models
Sharp Monocular Metric Depth in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Clean and efficient FP8 GEMM kernels with fine-grained scaling
The official PyTorch implementation of Google's Gemma models
Implementation of "MobileCLIP" CVPR 2024
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal Diffusion with Representation Alignment
Official code for Style Aligned Image Generation via Shared Attention
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
ICLR2024 Spotlight: curation/training code, metadata, distribution
A PyTorch library for implementing flow matching algorithms
Official DeiT repository