Replace OpenAI GPT with another LLM in your app
Library for serving Transformers models on Amazon SageMaker
A library for accelerating Transformer models on NVIDIA GPUs
Standardized Serverless ML Inference Platform on Kubernetes
Lightweight Python library for adding real-time multi-object tracking
A unified framework for scalable computing
Open-source tool designed to enhance the efficiency of workloads
Operating LLMs in production
An MLOps framework to package, deploy, monitor and manage models
LLM training code for MosaicML foundation models
Optimizing inference proxy for LLMs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Build your chatbot within minutes on your favorite device
GPU environment management and cluster orchestration
MII makes low-latency and high-throughput inference possible
Phi-3.5 for Mac: Locally-run Vision and Language Models
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Probabilistic reasoning and statistical analysis in TensorFlow
Libraries for applying sparsification recipes to neural networks
Gaussian processes in TensorFlow
Single-cell analysis in Python
Training and deploying machine learning models on Amazon SageMaker
A library to communicate with ChatGPT, Claude, Copilot, Gemini
Sparsity-aware deep learning inference runtime for CPUs
Simplifies the local serving of AI models from any source