Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Library for serving Transformers models on Amazon SageMaker
A library for accelerating Transformer models on NVIDIA GPUs
MII makes low-latency and high-throughput inference possible
A GPU-accelerated library containing highly optimized building blocks
Uncover insights, surface problems, monitor, and fine tune your LLM
Lightweight anchor-free object detection model
Implementation of model parallel autoregressive transformers on GPUs
Toolkit for allowing inference and serving with MXNet in SageMaker