A high-throughput and memory-efficient inference and serving engine
A state-of-the-art open visual language model
Ongoing research training transformer models at scale
Large-language-model & vision-language-model based on Linear Attention
Run 100B+ language models at home, BitTorrent-style
Implementation of model parallel autoregressive transformers on GPUs
An implementation of model parallel GPT-2 and GPT-3-style models