A method to increase the speed and lower the memory footprint
LLaMA: Open and Efficient Foundation Language Models
Implementation of model parallel autoregressive transformers on GPUs
A minimal PyTorch re-implementation of the OpenAI GPT
Reference implementation of the Transformer architecture optimized
Code release for "Masked-attention Mask Transformer
GLIDE: a diffusion-based text-conditional image synthesis model
Dual LSTM Encoder for Dialog Response Generation
Open language model developed by NVIDIA as part of Nemotron-3 family