A high-throughput and memory-efficient inference and serving engine
A high-performance inference system for large language models
C++ library for high performance inference on NVIDIA GPUs
Deep learning optimization library: makes distributed training easy
Low-latency REST API for serving text-embeddings
MII makes low-latency and high-throughput inference possible
Large Language Model Text Generation Inference
OpenMLDB is an open-source machine learning database
A scalable inference server for models optimized with OpenVINO
A GPU-accelerated library containing highly optimized building blocks
Tensor search for humans
Lightweight inference library for ONNX files, written in C++
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Fast and user-friendly runtime for transformer inference