[SSI] Re: process tracking
Brought to you by:
brucewalker,
rogertsang
From: Bruce W. <br...@ka...> - 2001-07-17 06:41:39
|
Bill, I added the SSI mailing list to this response since there is a lot of SSI in your note. You have, to a large extent, described what we already do in the SSI code which we will release later this month (see www.opensource.compaq.com for the SSI project if you are interested), and, as you point out, there are some tricky aspects. In that system, - all processes except the single init have a node encoded pid (node part is the node where the process was created) - the node where a process is created is responsible for tracking where the process is currently executing but, unlike the Mosix implementation, does not execute any of the system code for the process. All system calls are executed locally. - having the creation node track the process is handy to make sure the pid is not reused; - any system call (like kill) executed by any process on any node can query the creation node to determine where the process currently is (dealing of course with a process who was migrating at that very instant) - if the creation node fails (we actually called it the origin node), tracking of the process is taken over by the "surrogate origin" node (note that in Mosix, it is necessary for all processes that started from the origin node to abort since all, or almost all, their system calls are executed back on the home node). - when a node reboot and rejoins the cluster, it regains the tracking of old processes so the "finding a process" algorithm is not complicated and so that pid is not reused. - the algorithm for finding a process is simply: a - look locally b - if the creation node is up, ask him whether the process exists and if so, where c - if the creation node is down, ask the well known surrogate origin node that same question - by the way, if the surrogate origin node dies, it's data is automatically rebuilt on the well known surrogate takeover node. - there are no single points of failure for cluster data or the cluster itself. I'm sorry you can't all try it out yet and help us improve on the SSI capability. I'll send mail when the download is available. bruce walker Open SSI Cluster Architect Linux Technology Office Compaq Computers. > Bill Todd wrote: > > This sounds like a fairly standard distributed-system issue, and it > should > be amenable to a standard solution. > > If one assumes that process migrations are too common (and possibly that > there are too many total processes in the cluster) to have all nodes > track > the locations of all (even just all migrated) processes, then some kind > of > process location directory mechanism is required. The home-node > mechanism > (assuming that the home node is implicit in the CPID) both avoids the > need > for creating a more elaborate directory service and promotes reasonable > distribution of forwarding loads for migrated processes (it also saves > messages, though exactly which messages are saved depends on the details > of > the mechanism it's competing with). > > When a node fails, one can select an alternate node as the surrogate > home, > broadcast this selection to the remaining members of the cluster, and > update > the selected node with the current locations of all still-extant > processes > migrated from the failed node (if the failed node recovers, it can then > reassume responsibility for this location list). > > When any node needs to communicate with a process for the first time, it > can > target the process's home node (as determined from the CPID), and either > reach the process there or be forwarded to its actual location. If the > home > node is known to be dead, a surrogate should already be known (if not, > the > appropriate mechanism should be started). And whenever a node is > communicating with a process for *other than* the first time, it should > usually be able to find the process's last-known location in a local > cache > it can maintain (assuming that communication occurs much more frequently > than process migration, this will be a win - and if that assumption is > false, then the occasional additional message to a stale location will > pale > in comparison to the migration overhead anyway). > > That's a moderate amount of mechanism to create (including some > moderately > tricky synchronization during state transitions), but that's the cost of > creating a solution that's both high-performance and scalable (if those > are > goals). > > - bill > > > > Linux-cluster: generic cluster infrastructure for Linux > Archive: http://mail.nl.linux.org/linux-cluster/ |