Port of Facebook's LLaMA model in C/C++
Port of OpenAI's Whisper model in C/C++
User-friendly AI Interface
Run Local LLMs on Any Device. Open-source
High-performance neural network inference framework for mobile
ONNX Runtime: cross-platform, high performance ML inferencing
FlashInfer: Kernel Library for LLM Serving
A high-throughput and memory-efficient inference and serving engine
MNN is a blazing fast, lightweight deep learning framework
C++ library for high performance inference on NVIDIA GPUs
OpenVINO™ Toolkit repository
Protect and discover secrets using Gitleaks
Everything you need to build state-of-the-art foundation models
The free, Open Source alternative to OpenAI, Claude and others
A scalable inference server for models optimized with OpenVINO
A library for accelerating Transformer models on NVIDIA GPUs
Open standard for machine learning interoperability
Simplifies the local serving of AI models from any source
LLMs as Copilots for Theorem Proving in Lean
AIMET is a library that provides advanced quantization and compression
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Optimizing inference proxy for LLMs
A RWKV management and startup tool, full automation, only 8MB
Bayesian inference with probabilistic programming
Lightweight Python library for adding real-time multi-object tracking