Fast inference engine for Transformer models
AIMET is a library that provides advanced quantization and compression
A library for accelerating Transformer models on NVIDIA GPUs
An innovative library for efficient LLM inference
PyTorch library of curated Transformer models and their components
An easy-to-use LLMs quantization package with user-friendly apis
Open platform for training, serving, and evaluating language models
Visual Instruction Tuning: Large Language-and-Vision Assistant
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Self-contained Machine Learning and Natural Language Processing lib