Generate Any 3D Scene in Seconds
Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
Foundation Models for Time Series
A Production-ready Reinforcement Learning AI Agent Library
A PyTorch library for implementing flow matching algorithms
PyTorch code and models for the DINOv2 self-supervised learning
Foundational Models for State-of-the-Art Speech and Text Translation
Memory-efficient and performant finetuning of Mistral's models
Analyze computation-communication overlap in V3/R1
Official implementation of DreamCraft3D
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Diffusion Transformer with Fine-Grained Chinese Understanding
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
LLM-based Reinforcement Learning audio edit model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
New family of code large language models (LLMs)
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
DeepMind model for tracking arbitrary points across videos & robotics
Uncommon Objects in 3D dataset
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Language modeling in a sentence representation space
An AI-powered security review GitHub Action using Claude