A scalable inference server for models optimized with OpenVINO
A toolkit to optimize ML models for deployment for Keras & TensorFlow
Port of Facebook's LLaMA model in C/C++
Port of OpenAI's Whisper model in C/C++
User-friendly AI Interface
AIMET is a library that provides advanced quantization and compression
Uncover insights, surface problems, monitor, and fine tune your LLM
A high-performance ML model serving framework, offers dynamic batching
Everything you need to build state-of-the-art foundation models
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Unified Model Serving Framework
Trainable models and NN optimization tools
The free, Open Source alternative to OpenAI, Claude and others
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Private Open AI on Kubernetes
Unofficial (Golang) Go bindings for the Hugging Face Inference API
Neural Network Compression Framework for enhanced OpenVINO
Simplifies the local serving of AI models from any source
High-performance neural network inference framework for mobile
ONNX Runtime: cross-platform, high performance ML inferencing
Library for serving Transformers models on Amazon SageMaker
Official inference library for Mistral models
Run Local LLMs on Any Device. Open-source
A unified framework for scalable computing
An MLOps framework to package, deploy, monitor and manage models