A high-throughput and memory-efficient inference and serving engine
FlashInfer: Kernel Library for LLM Serving
Official inference library for Mistral models
Training and deploying machine learning models on Amazon SageMaker
OpenVINO™ Toolkit repository
Everything you need to build state-of-the-art foundation models
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Trainable models and NN optimization tools
Neural Network Compression Framework for enhanced OpenVINO
A library for accelerating Transformer models on NVIDIA GPUs
Efficient few-shot learning with Sentence Transformers
The official Python client for the Huggingface Hub
Operating LLMs in production
C++ library for high performance inference on NVIDIA GPUs
Unified Model Serving Framework
lightweight, standalone C++ inference engine for Google's Gemma models
Private Open AI on Kubernetes
Large Language Model Text Generation Inference
Open standard for machine learning interoperability
20+ high-performance LLMs with recipes to pretrain, finetune at scale
A Pythonic framework to simplify AI service building
Library for serving Transformers models on Amazon SageMaker
MNN is a blazing fast, lightweight deep learning framework
A high-performance ML model serving framework, offers dynamic batching
Easiest and laziest way for building multi-agent LLMs applications