Fast inference engine for Transformer models
Port of OpenAI's Whisper model in C/C++
High-speed Large Language Model Serving for Local Deployment
Easy-to-use deep learning framework with 3 key features
LLM inference in C/C++
Official inference framework for 1-bit LLMs
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Low-latency AI inference engine optimized for mobile devices
High-performance neural network inference framework for mobile
A Python library for audio
QVAC Fabric: cross-platform LLM inference and fine-tuning
Gradient boosting framework based on decision tree algorithms
MNN is a blazing fast, lightweight deep learning framework
C++ library for high performance inference on NVIDIA GPUs
ArrayFire, a general purpose GPU library
Bolt is a deep learning library with high performance
Run GGUF models easily with a UI or API. One File. Zero Install.
Lightweight inference library for ONNX files, written in C++
A High Performance Library for Sequence Processing and Generation
Deep learning inference framework optimized for mobile platforms
Fast and user-friendly runtime for transformer inference
A RocksDB compatible KV storage engine with better performance
Open deep learning compiler stack for cpu, gpu
10x faster matrix and vector operations