A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Generate Any 3D Scene in Seconds
Industrial-level controllable zero-shot text-to-speech system
The official PyTorch implementation of Google's Gemma models
DeepSeek Coder: Let the Code Write Itself
Diversity-driven optimization and large-model reasoning ability
ICLR2024 Spotlight: curation/training code, metadata, distribution
Towards Real-World Vision-Language Understanding
Foundation Models for Time Series
tiktoken is a fast BPE tokeniser for use with OpenAI's models
High-resolution models for human tasks
Renderer for the harmony response format to be used with gpt-oss
Implementation of "MobileCLIP" CVPR 2024
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Sharp Monocular Metric Depth in Less Than a Second
This repository contains the official implementation of FastVLM
PyTorch code and models for the DINOv2 self-supervised learning
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Official implementation of DreamCraft3D
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Large Multimodal Models for Video Understanding and Editing
Unified Multimodal Understanding and Generation Models