Run Local LLMs on Any Device. Open-source
CUDA Templates for Linear Algebra Subroutines
A @ClickHouse fork that supports high-performance vector search
LLM inference in C/C++
Tensor library for machine learning
LiteRT, successor to TensorFlow Lite
TT-NN operator library, and TT-Metalium low level kernel programming
The AI-Native Search Database
Clean and efficient FP8 GEMM kernels with fine-grained scaling
UCCL is an efficient communication library for GPUs
Fast Multimodal LLM on Mobile Devices
High-speed Large Language Model Serving for Local Deployment
Machine learning algorithms for advanced analytics