High-performance library for gradient boosting on decision trees
High-speed Large Language Model Serving for Local Deployment
GPU accelerated decision optimization
HeavyDB (formerly MapD/OmniSciDB)
UCCL is an efficient communication library for GPUs
Alibaba's high-performance LLM inference engine for diverse apps
C++ library for high performance inference on NVIDIA GPUs
FlashMLA: Efficient Multi-head Latent Attention Kernels
RAPIDS Machine Learning Library
A GPU-accelerated library containing highly optimized building blocks
lightweight, standalone C++ inference engine for Google's Gemma models
Run Local LLMs on Any Device. Open-source
OpenVINO™ Toolkit repository
ArrayFire, a general purpose GPU library
High-performance neural network inference framework for mobile
Fast inference engine for Transformer models
Gradient boosting framework based on decision tree algorithms
MNN is a blazing fast, lightweight deep learning framework
Serving system for machine learning models
Mooncake is the serving platform for Kimi
oneAPI Deep Neural Network Library (oneDNN)
The Compute Library is a set of computer vision and machine learning
Official inference framework for 1-bit LLMs
TT-NN operator library, and TT-Metalium low level kernel programming
Code for Cicero, an AI agent that plays the game of Diplomacy