Paddler
Open-source LLM load balancer and serving platform for hosting LLMs
...It supports running models locally through engines such as llama.cpp while distributing requests across multiple compute nodes to improve performance and reliability. The architecture is designed with privacy and cost control in mind, making it suitable for organizations that handle sensitive data or require predictable operational costs. Paddler also includes tools for monitoring, request buffering, and autoscaling integration so that deployments can adapt dynamically to changing workloads. A built-in administrative interface allows developers and operations teams to manage models, observe system performance, and test inference endpoints.