A scalable inference server for models optimized with OpenVINO
The Triton Inference Server provides an optimized cloud
Optimizing inference proxy for LLMs
The AI-native (edge and LLM) proxy for agents
Easiest and laziest way for building multi-agent LLMs applications
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Deep Learning API and Server in C++14 support for Caffe, PyTorch
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Large Language Model Text Generation Inference
Standardized Serverless ML Inference Platform on Kubernetes
Openai style api for open large language models
Low-latency REST API for serving text-embeddings
Library for serving Transformers models on Amazon SageMaker
Open Source and Lightweight Local LLM Platform
LLM Chatbot Assistant for Openfire server
Visual Instruction Tuning: Large Language-and-Vision Assistant
Serve machine learning models within a Docker container
Toolkit for allowing inference and serving with MXNet in SageMaker
CPU/GPU inference server for Hugging Face transformer models
Deploy a ML inference service on a budget in 10 lines of code