A high-throughput and memory-efficient inference and serving engine
Official inference library for Mistral models
Everything you need to build state-of-the-art foundation models
Unified Model Serving Framework
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
OpenVINO™ Toolkit repository
Easiest and laziest way for building multi-agent LLMs applications
Training and deploying machine learning models on Amazon SageMaker
A Pythonic framework to simplify AI service building
The official Python client for the Huggingface Hub
Bring the notion of Model-as-a-Service to life
PArallel Distributed Deep LEarning: Machine Learning Framework
Large Language Model Text Generation Inference
State-of-the-art diffusion models for image and audio generation
A set of Docker images for training and serving models in TensorFlow
Operating LLMs in production
Visual Instruction Tuning: Large Language-and-Vision Assistant
Libraries for applying sparsification recipes to neural networks
Gaussian processes in TensorFlow
A GPU-accelerated library containing highly optimized building blocks
Easy-to-use deep learning framework with 3 key features
Easy-to-use Speech Toolkit including Self-Supervised Learning model
An innovative library for efficient LLM inference
Optimizing inference proxy for LLMs
Neural Network Compression Framework for enhanced OpenVINO