|
From: Fred O. <fko...@gm...> - 2010-07-26 22:05:22
|
On Mon, Jul 26, 2010 at 2:37 PM, Bryan Thompson <br...@sy...> wrote: > I've been out for a bit with my head wrapped around other things. Can you remind me how we are going to handle the assignment of physical service nodes to logical service nodes in this design? What is a node (physical or logical) in this context? I think you mean that a physical node is a machine. If so, then what is a logical node? As I understand the services, all service instances are physical. The logical service construct exists only as an abstraction on which the rules in the rules based specification may operate. If I understand correctly, then with the instance level specification, there are no logical services and no logical nodes. But still, what's a logical node? > Concerning your points below, either scheme can be made fully deterministic. It is only a matter of specifying that a specific service must run on a specific host (a constraint on what services can run on a given host). If you can make the rules based scheme deterministic, then please do! But I think meant only that you can write rules that constrain the behavior such that the result in those particular cases are deterministic, which is an entirely different matter. The rules based scheme makes for much more code, locking and synchronization, very difficult testing, and a less maintainable environment. > I see rules based specification as more flexible because you can administer the rule set rather than making a decision for each node that you bring online. I agree that it is more adaptive since the constraints are being specified at a level above the individual node. I see the rules as globally transparent because they are just data which could be edited using, for example, a web browser backed by an application looking at the data in zookeeper where as the instance level specification must be edited on each node. I think of rules as more scalable because you do not have to figure out what you are going to do with each node. The node will be put to a purpose for which it is fit and for which there is a need. I think we're going to disagree about merits of instance vs. rules schemes, and I hope we can modularize the system so that the schemes are separate modules and independent of the core functionality (which wouldn't need zookeeper). My biggest concern about that last paragraph (or the whole message?) is that this use of zookeeper seemed unnecessary and confusing. That is, why wouldn't the web app interact with the service instances directly to get/set configurations using well defined, testable public interfaces, rather than use zookeeper as a hub? (That's the secret messages in dead drops thing.) > However, as long as we have a reasonable path for HA service allocation which respects the need to associate specific physical service instances with specific logical service instances then it seems reasonable that either approach could be used. It just becomes a matter of how we describe what services the node will start and whether or not we run the ServicesManagerService on that node. Clearly HA needs set of like service instances to work in active/active or active/passive arrangements. The term "logical service" seems overloaded in that (as far as I have figured out) it has different meanings in the pre-HA and post-HA discussions. I can see that an HA logical data service would refer to the group of data service instances which together host a single shard. But this definition is very specific and differs from the more general meaning in the rules based specification discussion, which is confusing. The instance-based scheme can be used for HA as well as long as the service configurations are extended to indicate which "HA logical" group a service belonged to. Fred > PS: Concerning "flex", the big leverage for flexing the cluster will come with a shared disk architecture (rather than the present shared nothing architecture). Using a shared disk architecture, the nodes can then be started or stopped without regard to the persistent state, which would be on managed storage. That would make it possible to tradeoff dynamically which nodes were assigned to which application, where the application might be bigdata, hadoop, etc. In that kind of scenario I find it difficult to imagine that an operator will be in the loop when a node is torn down and then repurposed to a different application. However, this kind of "flex" is outside the scope of the current effort. OK. I see the primary benefit of this arrangement as making hot spares become operational much more quickly, but I don't see how this applies to the rules vs. instance based specifications discussion. Both schemes can handle this arrangement. Fred |