A high-throughput and memory-efficient inference and serving engine
Official inference library for Mistral models
Everything you need to build state-of-the-art foundation models
Unified Model Serving Framework
Ready-to-use OCR with 80+ supported languages
OpenVINO™ Toolkit repository
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
FlashInfer: Kernel Library for LLM Serving
Easiest and laziest way for building multi-agent LLMs applications
Training and deploying machine learning models on Amazon SageMaker
C++ library for high performance inference on NVIDIA GPUs
A Pythonic framework to simplify AI service building
The official Python client for the Huggingface Hub
Open standard for machine learning interoperability
Bring the notion of Model-as-a-Service to life
PArallel Distributed Deep LEarning: Machine Learning Framework
State-of-the-art diffusion models for image and audio generation
Large Language Model Text Generation Inference
A set of Docker images for training and serving models in TensorFlow
The AI-native (edge and LLM) proxy for agents
Operating LLMs in production
MNN is a blazing fast, lightweight deep learning framework
Visual Instruction Tuning: Large Language-and-Vision Assistant
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Libraries for applying sparsification recipes to neural networks