Port of Facebook's LLaMA model in C/C++
A set of Docker images for training and serving models in TensorFlow
Official inference library for Mistral models
CPU/GPU inference server for Hugging Face transformer models
Deploy a ML inference service on a budget in 10 lines of code