Replace OpenAI GPT with another LLM in your app
Official inference library for Mistral models
The Triton Inference Server provides an optimized cloud
Large Language Model Text Generation Inference
Library for serving Transformers models on Amazon SageMaker
A high-throughput and memory-efficient inference and serving engine
FlashInfer: Kernel Library for LLM Serving
Optimizing inference proxy for LLMs
Deep learning optimization library: makes distributed training easy
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
MII makes low-latency and high-throughput inference possible
AIMET is a library that provides advanced quantization and compression
Standardized Serverless ML Inference Platform on Kubernetes
DoWhy is a Python library for causal inference
Uplift modeling and causal inference with machine learning algorithms
Low-latency REST API for serving text-embeddings
Easiest and laziest way for building multi-agent LLMs applications
Ready-to-use OCR with 80+ supported languages
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Single-cell analysis in Python
Everything you need to build state-of-the-art foundation models
A library for accelerating Transformer models on NVIDIA GPUs
Bring the notion of Model-as-a-Service to life
The official Python client for the Huggingface Hub
A set of Docker images for training and serving models in TensorFlow