Paddler
Open-source LLM load balancer and serving platform for hosting LLMs
...The system acts as a specialized load balancer and serving layer for language models, enabling organizations to run inference workloads without relying on external API providers. It supports running models locally through engines such as llama.cpp while distributing requests across multiple compute nodes to improve performance and reliability. The architecture is designed with privacy and cost control in mind, making it suitable for organizations that handle sensitive data or require predictable operational costs. Paddler also includes tools for monitoring, request buffering, and autoscaling integration so that deployments can adapt dynamically to changing workloads. ...