The Triton Inference Server provides an optimized cloud
Official inference library for Mistral models
Replace OpenAI GPT with another LLM in your app
Serve machine learning models within a Docker container
Library for serving Transformers models on Amazon SageMaker
A real time inference engine for temporal logical specifications
Toolkit for allowing inference and serving with MXNet in SageMaker
C++ library for high performance inference on NVIDIA GPUs
AIMET is a library that provides advanced quantization and compression
ONNX Runtime: cross-platform, high performance ML inferencing
A high-throughput and memory-efficient inference and serving engine
Video Frame Interpolation & Super Resolution using NVIDIA's TensorRT
Port of Facebook's LLaMA model in C/C++
Ready-to-use OCR with 80+ supported languages
OpenVINO™ Toolkit repository
Port of OpenAI's Whisper model in C/C++
The core OCaml system: compilers, runtime system, base libraries
Static type checker for Python
Formula recognition based on LaTeX-OCR and ONNXRuntime
Self-hosted, community-driven, local OpenAI compatible API
High-performance neural network inference framework for mobile
Bring the notion of Model-as-a-Service to life
Standardized Serverless ML Inference Platform on Kubernetes
Open source code for AlphaFold
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference