RGBD video generation model conditioned on camera input
Industrial-level controllable zero-shot text-to-speech system
From Images to High-Fidelity 3D Assets
Controllable & emotion-expressive zero-shot TTS
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Contexts Optical Compression
Python SDK for Claude Agent
Pokee Deep Research Model Open Source Repo
Visual Causal Flow
Audio foundation model excelling in audio understanding
Inference script for Oasis 500M
Foundational Models for State-of-the-Art Speech and Text Translation
Analyze computation-communication overlap in V3/R1
Pushing the Limits of Mathematical Reasoning in Open Language Models
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Large Multimodal Models for Video Understanding and Editing
A CNN model that predicts human joints from RGB images of a person
Let us control diffusion models
Generate embeddings from large-scale graph-structured data
Speculative-decoding accelerator for the 675B Mistral Large 3
Ultra-efficient 3B multimodal instruct model built for edge deployment
Compact 8B multimodal instruct model optimized for edge deployment
Efficient 8B multimodal model tuned for advanced reasoning tasks.
High-precision 14B multimodal model built for advanced reasoning tasks
Efficient 14B multimodal instruct model with edge deployment and FP8