...It acts as a central gateway that connects different inference engines such as Ollama, llama.cpp, vLLM, and OpenAI-compatible services, allowing them to function as interchangeable compute nodes within one system. The architecture is built around a hierarchical “master and slave” hub model, enabling distributed deployments where multiple machines or clusters can be managed through a single entry point. This design allows organizations to scale horizontally, combining local hardware, cloud resources, and specialized inference servers into a unified infrastructure. ...