Run Local LLMs on Any Device. Open-source
Everything you need to build state-of-the-art foundation models
A high-throughput and memory-efficient inference and serving engine
Uncover insights, surface problems, monitor, and fine tune your LLM
Standardized Serverless ML Inference Platform on Kubernetes
The official Python client for the Huggingface Hub
A library for accelerating Transformer models on NVIDIA GPUs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Operating LLMs in production
State-of-the-art diffusion models for image and audio generation
Phi-3.5 for Mac: Locally-run Vision and Language Models
Multilingual Automatic Speech Recognition with word-level timestamps
Official inference library for Mistral models
Trainable models and NN optimization tools
Integrate, train and manage any AI models and APIs with your database
Bring the notion of Model-as-a-Service to life
Libraries for applying sparsification recipes to neural networks
Gaussian processes in TensorFlow
Single-cell analysis in Python
A set of Docker images for training and serving models in TensorFlow
Optimizing inference proxy for LLMs
Neural Network Compression Framework for enhanced OpenVINO
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
Training and deploying machine learning models on Amazon SageMaker