Openai style api for open large language models
A high-throughput and memory-efficient inference and serving engine
Everything you need to build state-of-the-art foundation models
FlashInfer: Kernel Library for LLM Serving
OpenVINO™ Toolkit repository
Ready-to-use OCR with 80+ supported languages
The official Python client for the Huggingface Hub
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
A set of Docker images for training and serving models in TensorFlow
Optimizing inference proxy for LLMs
Training and deploying machine learning models on Amazon SageMaker
State-of-the-art Parameter-Efficient Fine-Tuning
Official inference library for Mistral models
Operating LLMs in production
lightweight, standalone C++ inference engine for Google's Gemma models
A Pythonic framework to simplify AI service building
Open standard for machine learning interoperability
The AI-native (edge and LLM) proxy for agents
Easiest and laziest way for building multi-agent LLMs applications
Bring the notion of Model-as-a-Service to life
C++ library for high performance inference on NVIDIA GPUs
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Replace OpenAI GPT with another LLM in your app
PArallel Distributed Deep LEarning: Machine Learning Framework
Libraries for applying sparsification recipes to neural networks