PyTorch library of curated Transformer models and their components
Build your chatbot within minutes on your favorite device
Efficient few-shot learning with Sentence Transformers
Library for serving Transformers models on Amazon SageMaker
A Unified Library for Parameter-Efficient Learning
Large Language Model Text Generation Inference
A library for accelerating Transformer models on NVIDIA GPUs
MII makes low-latency and high-throughput inference possible
An MLOps framework to package, deploy, monitor and manage models
An easy-to-use LLMs quantization package with user-friendly apis
Implementation of model parallel autoregressive transformers on GPUs
CPU/GPU inference server for Hugging Face transformer models