A library for accelerating Transformer models on NVIDIA GPUs
AIMET is a library that provides advanced quantization and compression
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Unified Model Serving Framework
Serve machine learning models within a Docker container
Pytorch domain library for recommendation systems
Multilingual Automatic Speech Recognition with word-level timestamps
PyTorch extensions for fast R&D prototyping and Kaggle farming
Create HTML profiling reports from pandas DataFrame objects
A lightweight vision library for performing large object detection
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
High quality, fast, modular reference implementation of SSD in PyTorch
Bring the notion of Model-as-a-Service to life
Superduper: Integrate AI models and machine learning workflows
Sparsity-aware deep learning inference runtime for CPUs
Phi-3.5 for Mac: Locally-run Vision and Language Models
Efficient few-shot learning with Sentence Transformers
A high-performance ML model serving framework, offers dynamic batching
Framework that is dedicated to making neural data processing
Tensor search for humans
Libraries for applying sparsification recipes to neural networks
A Unified Library for Parameter-Efficient Learning
Standardized Serverless ML Inference Platform on Kubernetes
Deep learning optimization library: makes distributed training easy
MII makes low-latency and high-throughput inference possible