A high-throughput and memory-efficient inference and serving engine
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Pytorch domain library for recommendation systems
A unified framework for scalable computing
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Run 100B+ language models at home, BitTorrent-style
Implementation of model parallel autoregressive transformers on GPUs