From: Robert E. <pa...@tu...> - 2004-02-10 06:46:44
|
Hi, all - I've got some preliminary support checked in for "daughterships". It's incomplete (there are some tricky issues left), but it's there to provoke discussion. The changes shouldn't affect any other uses. Recalling the observed problem, in very large-scale environments, it can take appreciably long for each server involved to contact the mothership and get its configuration. A "daughtership" (perhaps a better analogy would be a "surrogate mothership") relieves some of the pressure by providing an additional "contact point". The daughtership contacts the mothership to collect the entire configuration; from that point, servers may contact the daughtership instead of the mothership to get their own configuration details. The daughtership can tell the servers how to configure and what to do just as well as the mothership can. Multiple daughterships are supported; a hierarchy of daughterships is also possible (as each daughtership, after getting details from its mother, is a fully capable mothership). To start a daughtership: % export CRMOTHER=host:port # where is my mother? % export CRMOTHERSHIP=host:port # what "mothership" should I look like? % python .../cr/mothership/server/daughtership.py After that, any other server may be run with CRMOTHERSHIP pointing at the daughtership or at the original mothership, equivalently. What doesn't work: - Autostart. There are no node types (yet?) for daughterships, so they cannot be autostarted. - Dynamic hosts. A daughtership cannot resolve a dynamic host itself; it must contact its mother (otherwise, multiple daughterships may all resolve the same dynamic host to different actual hostnames, which will cause them to not be equivalent any more). But a daughtership in the middle of a connection request (which can trigger dynamic host matching) can't easily ask the mothership for information; it can send the request easily enough, but it can't guarantee that the next information coming from its mother is the correct response (because the mother propagates some commands to its daughterships). The logic to handle this gets spotty after that; you end up having to suspend the connection request and return to the communications loop, waiting until the proper data comes in; further, we lose the advantage of having a daughtership if there are many dynamic hosts... Providing nodes for daughterships and enforcing a strict hierarchy (i.e. servers may only contact their designated daughterships) would solve both problems, but limit flexibility, introduce complexity (especially with the graphical config tool), and possibly introduce confusion (if a dynamically-hosted crserver points to the wrong mothership by accident, it could still resolve, but ultimately leave one daughtership not knowing what to do with an extra connection, and another one waiting for a missing connection). I think it would be easier to code for the former (the daughtership contacts the mothership for dynamic host matching) than for the latter (nodes & hierarchy). But the whole thing may be complex enough to not be worth the effort; perhaps these changes should be abandoned, and another mechanism for large clusters considered... Thoughts? Bob Ellison Tungsten Graphics, Inc. |
From: Robert E. <pa...@tu...> - 2004-02-19 18:12:06
|
> Recalling the observed problem, in very large-scale environments, it can > take appreciably long for each server involved to contact the mothership > and get its configuration. > > A "daughtership" (perhaps a better analogy would be a "surrogate > mothership") relieves some of the pressure by providing an additional > "contact point". The daughtership contacts the mothership to collect > the entire configuration; from that point, servers may contact the > daughtership instead of the mothership to get their own configuration > details. The daughtership can tell the servers how to configure and > what to do just as well as the mothership can. ... > What doesn't work: > > - Autostart. There are no node types (yet?) for daughterships, so they > cannot be autostarted. > > - Dynamic hosts. A daughtership cannot resolve a dynamic host itself; > it must contact its mother (otherwise, multiple daughterships may all > resolve the same dynamic host to different actual hostnames, which will > cause them to not be equivalent any more). I've got code ready for check in (after just a few more tests) that supports dynamic hosts and daughterships. In the process of working with this and testing it, I've noted a few more things about daughterships I wanted to share. - No "cousins" are supported. Daughterships cannot broker communication requests between each other. That is, if a server and a client are supposed to ultimately communicate with each other, they must both connect to the same mothership or daughtership. Otherwise, one daughtership can sit on a "connectrequest" waiting for the matching "acceptrequest", while the other daughtership has the "acceptrequest" and is waiting for the "connectrequest". Although we could have the daughterships exchange information when they get these requests, such would complicate and delay communication between the nodes; you'd do just as well or better to have a single mothership. - Although dynamic hosts do work with daughterships now, I would hesitate to recommend their use together, as they tend to eliminate the scalable performance advantage daughterships can provide. (At least I believe they will; without a large cluster at my disposal, it's difficult to be certain.) When a configuration uses dynamic hosts (that is, unspecified host names that are resolved when appropriate servers contact the mothership), two things must happen: the mothership must track and uniquely resolve these dynamic names when necessary; and the mothership must suspend the application until all the dynamic hosts are resolved. (One of the first things that stub OpenGL library will do upon contact is get a list of all servers; all the dynamic servers must be resolved before this can happen.) Both of these are managed by having the "grandmothership" (i.e., a single mothership that has no parent itself, only daughters) handle all the resolution. So whenever a daughtership needs to resolve a dynamic host, it suspends the request that caused the resolution, and asks its mother to resolve the dynamic host on its behalf. Whenever the grandmothership resolves a dynamic host (either to handle a dynamic host contacting it directly, or to handle a request from a daughter), it tells all its daughters that a match has been made, so they can all log the match and use it. When a daughtership receives a match communication, it looks to see if it has any suspended connections waiting for this match; and if it does, it resumes them. This all works very well to keep daughterships and motherships synchronized, but does introduce a round-trip to the mothership (or perhaps more than one, if there are multiple layers of matriarchal hierarchy) to resolve any dynamic host. And this will likely be slower than direct connections to the mothership (although again I cannot be certain, without a large cluster to test with). I'll be making changes to the documentation to describe this, and then I'll be done with what I had planned to do along these lines. Bob Ellison Tungsten Graphics, Inc. |