Port of OpenAI's Whisper model in C/C++
High-performance neural network inference framework for mobile
Official inference framework for 1-bit LLMs
Infrastructure to enable deployment of ML models
Powerful Android AI agent with tools, automation, and Linux shell
Mooncake is the serving platform for Kimi
Fast inference engine for Transformer models
FlashMLA: Efficient Multi-head Latent Attention Kernels
Gradient boosting framework based on decision tree algorithms
A Python library for audio
High-speed Large Language Model Serving for Local Deployment
Low-latency AI inference engine optimized for mobile devices
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
TT-NN operator library, and TT-Metalium low level kernel programming
C++ library for high performance inference on NVIDIA GPUs
LLM inference in C/C++
Bolt is a deep learning library with high performance
Alibaba's high-performance LLM inference engine for diverse apps
LiteRT, successor to TensorFlow Lite
An Easy-to-Use and High-Performance AI Deployment Framework
ArrayFire, a general purpose GPU library
Lightning fast C++/CUDA neural network framework
QVAC Fabric: cross-platform LLM inference and fine-tuning
Easy-to-use deep learning framework with 3 key features
Run GGUF models easily with a UI or API. One File. Zero Install.