Re: [SSI] process migration
Brought to you by:
brucewalker,
rogertsang
From: Bruce W. <br...@ka...> - 2002-11-13 18:28:01
|
snip > That means a.out ran on node1 even after rebooting node2. So for a > simple application like above it will work. > snip > > > What kind of dependencies? Aneesh talked > > about some pseudo processes. > As I responded to another message, there is no pseudo process on the process's creation node. The creation node does keep track of the existence of the process and location where it is running, for two reasons - one, so it doesn't reuse the pid; second so it can help other processes locate the process in order to send signals,etc. (all done transparently in the kernel, of course). If the process creation node (called the origin node in the code) goes away, however, another node (surrogate origin node) transparently takes over the tracking of the process. Thus the loss of the creation node is transparent. Note that when the creation node rejoins the cluster, it resumes the tracking of old processes, for lots of good reasons. There are currently other object dependencies, however. Objects, like sockets, ptys, physical devices, pipes, systeVIPC, etc. are always created on the node where the process is running at the time it creates it (if it is created before a migrate, it is thus on the node the process is migrating away from; if it is created after the migrate (or rexec or rfork), it is created on the new node). The architectural plan is that objects will be able to move, just like processes. When that is complete, processes will be even more highly available. Some notes about the various objects: a. pipes, fifos, semaphores, message queues, shared memory, unix domain sockets, ... (all IPC objects that are within the cluster) - these are typically created to communicate between processes so if the processes themsevles are distributed, the IPC object can at best be located on one of the execution nodes. If that node crashes, the object and some of the processes using that object would be lost. Our goal, for availability, is to move all processes working together and all the objects they are using to the same node, when appropriate. b. internet sockets - connections which come into the cluster using the LVS VIP (Virtual IP) can be moved (the ldirector would have to be told the new location to redirect to). To the extent that the socket goes to telnetd and on to a pty, telnetd and the pty master and slave would also have to move. All this is possible but not done yet. Outgoing connections currently use the IP address of the local card as part of the name of the socket so these cannot be successfully migrated. Solving that would either involve using the VIP for outgoing IP address (I belive TruClusters does it this way) or doing local device IP address failover (which can get out of hand and is mostly unnecessary when you have VIP). c. ptys - have to move master and slave together. bruce > > > > What about network connections via TCP/IP or UDP after a migration of an > > process? How SSIC handles open network connections while the process is > > migrating? see above. > > All network connections are routed to the node in which the connection > originate, much or less the same thing as the mosix home node concept. > But this is because the socket is bound on the IP that is local to the > node and you can get it rebind on the node on which the process > migrated( On that node there is no such IP). But once we have the full > clusterwide IP support it should go away. ( With the full DNET code ). I think I just stated this a different way above. > > > Now the interesting part. ( I am not sure about the below details). > What happens to SYSV IPC. I don't think when the application migrates > the message queue is also migrating with the app. Because we have > cluster wide message queue it is not needed. All message queue related > activity is still happening on the node it was created. ( Here we could > do a performance opt by making the queue migrate if the queue read/write > is happening frequently from some other node. IBM DLM code does some > such thing for locks . They move the lock to the node which is doing > max lock unlock ). > > May be someone else can elaborate on how IPC are handled. ? I think you covered it. I'm surprised to hear that lock management of active locks moves in the DLM. I'll have to look into that. bruce > > > Regards, > > Mathias > > > > > > -aneesh > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Are you worried about > your web server security? Click here for a FREE Thawte > Apache SSL Guide and answer your Apache SSL security > needs: http://www.gothawte.com/rd523.html > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel |