|
From: Fred O. <fko...@gm...> - 2010-08-02 22:39:26
|
Bryan, > You are positing a new service which handles the binding of the available physical services to the required logical services. How do you plan to make that logical to physical binding service highly available? It seems to me that this centralizes an aspect of the distributed decision making which is currently performed by the ServicesManagerServer. If you make this binding service highly available, will you have recreated a distributed decision making service? It would of course be a simpler service since it does not handle service start decisions, but only service bind decisions. The new (simple, stateless) service I proposed in passing is useful only when new unbound (physical) data services are added to the HA cluster. Once new physical data services are bound into logical data services, this new service has no further useful function and can be shutdown. It does not need to be highly available. > Clients in this context refers to (a) ClientServices participating in a bulk load; (b) nodes providing SPARQL end points; and (c) the DataServices themselves, which use the same discovery mechanisms to locate other DataServices when moving shards or executing pipeline joins. The hosts which originate SPARQL requests are HTTP clients, but they are not bigdata federation clients and do not have any visibility into zookeeper, jini, or the bigdata services. > Given that "clients" are themselves bigdata services and that Zookeeper scales nicely to 1000s of nodes, why have clients go directly to the services rather than watching the quorum state in zookeeper? Removing one of two (mostly) redundant discovery mechanisms reduces code complexity. The River service discovery manager, already in use, scales even better to 1000s of nodes because persistence isn't needed and the redundant copies don't need to cooperate. After discovery, why not go directly to the service to ask for state if necessary? It is the same cost as going to zookeeper, right? The benefit is that going direct to the service involves an easily documented testable, maintainable interface, and any state being persisted is persisted by the service. Other than general service startup and HA group startup, what information is passed through zookeeper? Fred |