A high-throughput and memory-efficient inference and serving engine
Deep learning optimization library: makes distributed training easy
Low-latency REST API for serving text-embeddings
MII makes low-latency and high-throughput inference possible
Large Language Model Text Generation Inference
OpenMLDB is an open-source machine learning database
A GPU-accelerated library containing highly optimized building blocks
Tensor search for humans
Lightweight inference library for ONNX files, written in C++
Framework for Accelerating LLM Generation with Multiple Decoding Heads