From: Steven W. <sw...@il...> - 2002-08-30 21:34:22
|
Federico Sacerdoti wrote: > > So, as Steven and others have mentioned, we have a problem with ganglia > metrics. Metrics currently lie in a flat namespace, with no hierarchical > groupings. I have talked with Matt and Mason (a ganglia developer and my > boss) about this problem, and would like to state and define some of our > ideas. Hey, you guys WERE listening all those times I went on and on about this subject. :) > Another advantage of hierarchies comes from object-oriented design. > Attributes in the Branch tag, such as DMAX (when metrics get deleted), > become the default for all metrics below it. These can be overrided by > the individual metrics, analogous to overriding baseclass methods in an > OO class tree. This gives an easy way to assign attribute values to a > group of metrics. It seems to me this would also make the "DSO-ification" of the monitoring core a smoother process, not to mention a cleaner one from the standpoint of those developing the DSO's. :) > A third advantage is cleaner namespaces. You can call 'cpu_num' simply > 'num'. Similar naming simplifications are possible for the other > metrics. The most significant advantage is that we only have to worry > about name collisions among siblings in the tree. There can be a 'num' > metric in another branch (for example, the 'num' of network interfaces). > > So how do we name metrics in the XDR packet if we adopt a metric > hierarchy? This is a difficult problem, since we want to allow new > metrics to appear at any time. Imagine an XDR packet comes in. We need > to identify the metric, and update its value in our hash tables. I was thinking of "yet another hash" that has a hashed-up number based on the name or hierarchy position of the metric as a key. The idea being, this number is shorter than using the fully-qualified name of the metric all the time. So instead of encoding "cpu.idle" we encode 0x03FA450A and that field's 50% shorter (even better if we get to "processes.top.1.cpu_percentage"), and only have to multicast the real string name once. The hierarchical information is stored (as a pointer, at the very least) in this hash. What's really going to be key here is not so much the idea of making the statically-#define'd metric hash dynamic, but keeping it up to date... If we go far enough in this it'll look like SNMP, only more collaborative. :) > I believe the answer is that new nodes get their branch hierarchy all at > once from the oldest gmond in the cluster (which I will call the eldest > node). Matt has been talking about this for some time, as it will solve > some other problems as well. If we get an XDR metric packet that > specifies an unknown branch, we discard it. However, we realize that we > must have missed something, so we query the eldest node for their metric > hierarchy. If we can't find the eldest node, we query the second eldest, > etc. We also query the second eldest if we didn't learn anything new > from the eldest himself. (This solves the problem of the eldest node > having incomplete information). I would suggest a fallback method (at least an option) of consulting an "authoritative host." Maybe even a host running gmetad could be used as a fallback (after all, it's going to have to keep track of all this stuff too), although I don't necessarily think I'd recommend that. At the very least this will help us during development, and it's possible that some users might have a particular gmond running on "more reliable" hardware (this isn't a dig at any one platform, I was thinking along the lines of redundant PSUs and such) to be responsible for keeping track of cluster metric metadata. > The assumption is that the eldest node has been listening to all the > "create-branch" messages, and has a complete metric tree. This is gonna sound like DNS. If anyone doesn't know DNS, speak up now before I get too snug in wearing my hostmaster hat again... The primary node (eldest) may actively send sync'ing messages to the secondary node (second-eldest) in case of the primary's untimely death. Since I assume all traffic mentioned here will be on the multicast channel, a separate conduit between primary and secondary is probably redundant - eldest and second-eldest will behave identically except that the second-eldest won't answer queries unless the eldest misses a heartbeat or doesn't answer a query older than "query_timeout" seconds. Individual nodes are always "authoritative" for branches of the metric tree which they themselves have implemented. The query packet format needs to have an optional destination field which contains the multicast hostname/IP of a member node. If a node receives a query addressed to it from the elder server, then it responds by sending its "create-branch/create-metric" messages again to the cluster. This should be the only time this metric is *rebroadcast* by a node. On joining the network, a new node will announce itself and wait for the heartbeats to start flowing in before it sends any multicasts besides heartbeat, hostname, gmond_started and gmond_version. The elder gmond should, upon receiving a new gmond heartbeat, transmit the metric tree. The new gmond, as it receives the tree, compares it to its internal metrics and sends "create-branch/create-metric" messages for each metric it supports that is not in the metric tree received by the elder. Cripes, this is turning into an RFC. Should I just write this up as such? > This email message is getting too long, but I would go on about how we > could use the idea of database indexes to quickly locate any branch in > the tree. Heh, in that case that renders the first part of my message redundant. ;) > I hope I have been relatively clear about these ideas. I realize this > problem is pretty dense, and this solution is in its infancy. But the > point I would like to drive home is that a naming hierarchy is helpful > for specific reasons, and that its efficient implementation is possible > in the ganglia framework. Dense, yes, but the area of metrics is just about the only one in the Ganglia design that *doesn't* scale well (kudos, Matt & co.). I'm sure that we can work this out if we just keep banging those rocks together. :) |