The Triton Inference Server provides an optimized cloud
Openai style api for open large language models
Easiest and laziest way for building multi-agent LLMs applications
Deep Learning API and Server in C++14 support for Caffe, PyTorch
Low-latency REST API for serving text-embeddings
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Large Language Model Text Generation Inference
Open Source and Lightweight Local LLM Platform
LLM Chatbot Assistant for Openfire server