Repo for external large-scale work
A method to increase the speed and lower the memory footprint
Implementation of model parallel autoregressive transformers on GPUs
A minimal PyTorch re-implementation of the OpenAI GPT
A latent text-to-image diffusion model
Learning to Act by Watching Unlabeled Online Videos
Code release for "Masked-attention Mask Transformer
An implementation of model parallel GPT-2 and GPT-3-style models
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Open language model developed by NVIDIA as part of Nemotron-3 family
Model that fuses instruct, reasoning and agentic skills
High-efficiency reasoning and agentic intelligence model
JetBrains’ 4B parameter code model for completions
Tencent’s 36-language state-of-the-art translation model