Run Local LLMs on Any Device. Open-source
Port of Facebook's LLaMA model in C/C++
Open standard for machine learning interoperability
Fast inference engine for Transformer models
AI interface for tinkerers (Ollama, Haystack RAG, Python)
Optimizing inference proxy for LLMs
Connect home devices into a powerful cluster to accelerate LLM
On-device Speech Recognition for Apple Silicon
A set of Docker images for training and serving models in TensorFlow
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
Trainable models and NN optimization tools
The AI-native (edge and LLM) proxy for agents
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
An MLOps framework to package, deploy, monitor and manage models
OpenMMLab Model Deployment Framework
High-level Deep Learning Framework written in Kotlin
llama.go is like llama.cpp in pure Golang
OpenMMLab Video Perception Toolbox