A high-throughput and memory-efficient inference and serving engine
High-performance neural network inference framework for mobile
PArallel Distributed Deep LEarning: Machine Learning Framework
C++ library for high performance inference on NVIDIA GPUs
Fast inference engine for Transformer models
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Pytorch domain library for recommendation systems
A unified framework for scalable computing
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Run 100B+ language models at home, BitTorrent-style
Implementation of model parallel autoregressive transformers on GPUs