Official inference library for Mistral models
Replace OpenAI GPT with another LLM in your app
The Triton Inference Server provides an optimized cloud
Large Language Model Text Generation Inference
High-performance inference server for text embeddings models API layer
Library for serving Transformers models on Amazon SageMaker
A high-throughput and memory-efficient inference and serving engine
Optimizing inference proxy for LLMs
FlashInfer: Kernel Library for LLM Serving
Deep learning optimization library: makes distributed training easy
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Standardized Serverless ML Inference Platform on Kubernetes
High-performance inference framework for large language models
DoWhy is a Python library for causal inference
AlphaFold 3 inference pipeline
MII makes low-latency and high-throughput inference possible
TensorRT LLM provides users with an easy-to-use Python API
Single-cell analysis in Python
Easiest and laziest way for building multi-agent LLMs applications
AIMET is a library that provides advanced quantization and compression
High-performance Inference and Deployment Toolkit for LLMs and VLMs
Low-latency REST API for serving text-embeddings
Uplift modeling and causal inference with machine learning algorithms
High-Resolution Image Synthesis with Latent Diffusion Models
Everything you need to build state-of-the-art foundation models