Openai style api for open large language models
A high-throughput and memory-efficient inference and serving engine
FlashInfer: Kernel Library for LLM Serving
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Everything you need to build state-of-the-art foundation models
Official inference library for Mistral models
The official Python client for the Huggingface Hub
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Operating LLMs in production
Ready-to-use OCR with 80+ supported languages
Optimizing inference proxy for LLMs
Large Language Model Text Generation Inference
Training and deploying machine learning models on Amazon SageMaker
Easiest and laziest way for building multi-agent LLMs applications
A library for accelerating Transformer models on NVIDIA GPUs
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Trainable models and NN optimization tools
Neural Network Compression Framework for enhanced OpenVINO
Library for serving Transformers models on Amazon SageMaker
Efficient few-shot learning with Sentence Transformers
A set of Docker images for training and serving models in TensorFlow
State-of-the-art Parameter-Efficient Fine-Tuning
A Pythonic framework to simplify AI service building
MII makes low-latency and high-throughput inference possible
Library for OCR-related tasks powered by Deep Learning