From: Kimitoshi T. <kt...@cl...> - 2004-04-04 19:49:38
|
er...@he... wrote: >On Thu, Apr 01, 2004 at 03:04:29AM +0900, Kimitoshi Takahashi wrote: >> Hi all, >> >> Form what I read from Bproc documents, the process migration is volunatry, >> meaning bproc_move() must be called from the proccess to be moved. >> >> The lovely bpsh seems to wrap non-bproc program, and cause the program to move involuntary, >> using bproc_vexecmove(), only at the begining. >> >> I'm wondering if there is any way to cause a non-bproc procces to move involuntary any time >> at user's will. >> >> My colleague uses a heterogeneous cluster where the memory sizes on nodes vary. >> He sometimes wants to move small process on a large memory machine >> before he starts obviously huge proccess. He is only using bpsh to start processes. >> >> Is it technically feasible to write a like of bpsh which always wraps a process on slave nodes, >> and handles a "move now to where" signal ? >> >> How would you deal with the situation my colleague has ? > >I think a "wrapper" could take the form of a shared library. You >could manually LD_PRELOAD yourself or you could modify bpsh to >automatically set LD_PRELOAD for the child processes. I'm afraid I don't fully understand what you meant, probably I need to learn more about basics of C programing .... My guess is that signal handler is in libc and you suggested to preload a signal handler which calls bproc_move() when it gets certain signal. Is that what you meant ? >A signal seems like a good way to get the process's attention but you >still need another way to tell it where to move to. I can't think of >anything easy for that off the top of my head. How about making it a two step process: 1. When a process gets certain signal, it VMAdumps itself to the network stream and bpmaster stores it into a file on the master. 2. You can then manually restart the process explicitly specifying where to move. It's not cool in that the process migration is not peer to peer, rather it is origin-master-target. This could be also used as a general check point/restarting functionality. I have no talent in programing, though, :-( Sincerely, Kimitoshi Takahashi |