XNNPACK is a highly optimized, low-level neural network inference library developed by Google for accelerating deep learning workloads across a variety of hardware architectures, including ARM, x86, WebAssembly, and RISC-V. Rather than serving as a standalone ML framework, XNNPACK provides high-performance computational primitives—such as convolutions, pooling, activation functions, and arithmetic operations—that are integrated into higher-level frameworks like TensorFlow Lite, PyTorch Mobile, ONNX Runtime, TensorFlow.js, and MediaPipe. The library is written in C/C++ and designed for maximum portability, efficiency, and performance, leveraging platform-specific instruction sets (e.g., NEON, AVX, SIMD) for optimized execution. It supports NHWC tensor layouts and allows flexible striding along the channel dimension to efficiently handle channel-split and concatenation operations without additional cost.
Features
- Cross-platform neural network inference backend optimized for ARM, x86, WebAssembly, and RISC-V
- High-performance implementations for 2D convolutions, pooling, activation, and quantization operators
- Supports both FP32 and INT8 inference with per-channel quantization
- Efficient NHWC tensor layout with flexible channel stride
- Integrates seamlessly with frameworks like TensorFlow Lite, TensorFlow.js, PyTorch, ONNX Runtime, and MediaPipe
- Multi-threaded and vectorized operator implementations