PyTorch library of curated Transformer models and their components
Visual Instruction Tuning: Large Language-and-Vision Assistant
Adversarial Robustness Toolbox (ART) - Python Library for ML security
Replace OpenAI GPT with another LLM in your app
Unified Model Serving Framework
Sparsity-aware deep learning inference runtime for CPUs
GPU environment management and cluster orchestration
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
A library to communicate with ChatGPT, Claude, Copilot, Gemini
State-of-the-art diffusion models for image and audio generation
Library for serving Transformers models on Amazon SageMaker
Phi-3.5 for Mac: Locally-run Vision and Language Models
MII makes low-latency and high-throughput inference possible
A unified framework for scalable computing
Multilingual Automatic Speech Recognition with word-level timestamps
Uncover insights, surface problems, monitor, and fine tune your LLM
Create HTML profiling reports from pandas DataFrame objects
Low-latency REST API for serving text-embeddings
AIMET is a library that provides advanced quantization and compression
PyTorch extensions for fast R&D prototyping and Kaggle farming
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Libraries for applying sparsification recipes to neural networks
Lightweight Python library for adding real-time multi-object tracking
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method