C++ library for high performance inference on NVIDIA GPUs
Alibaba's high-performance LLM inference engine for diverse apps
FlashMLA: Efficient Multi-head Latent Attention Kernels
C++-based high-performance parallel environment execution engine
Official inference framework for 1-bit LLMs
A lightweight, lightning-fast, in-process vector database
OpenMLDB is an open-source machine learning database
A scalable inference server for models optimized with OpenVINO
FAIR Sequence Modeling Toolkit 2
A GPU-accelerated library containing highly optimized building blocks
QVAC Fabric: cross-platform LLM inference and fine-tuning
Mooncake is the serving platform for Kimi
Lightweight inference library for ONNX files, written in C++
Transformer related optimization, including BERT, GPT
A High Performance Library for Sequence Processing and Generation
A RocksDB compatible KV storage engine with better performance
Fast and user-friendly runtime for transformer inference
A TensorFlow implementation of Scalable Distributed Deep-RL