Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Replace OpenAI GPT with another LLM in your app
Everything you need to build state-of-the-art foundation models
A Pythonic framework to simplify AI service building
State-of-the-art diffusion models for image and audio generation
Unified Model Serving Framework
The official Python client for the Huggingface Hub
Bring the notion of Model-as-a-Service to life
Create HTML profiling reports from pandas DataFrame objects
Single-cell analysis in Python
AIMET is a library that provides advanced quantization and compression
Official inference library for Mistral models
Training and deploying machine learning models on Amazon SageMaker
Data manipulation and transformation for audio signal processing
Phi-3.5 for Mac: Locally-run Vision and Language Models
Operating LLMs in production
Uncover insights, surface problems, monitor, and fine tune your LLM
Large Language Model Text Generation Inference
Low-latency REST API for serving text-embeddings
A library for accelerating Transformer models on NVIDIA GPUs
GPU environment management and cluster orchestration
PyTorch library of curated Transformer models and their components
Simplifies the local serving of AI models from any source