[SSI] Re: HA and process migration
Brought to you by:
brucewalker,
rogertsang
|
From: Peter B. <tab...@ya...> - 2001-07-19 04:00:03
|
--- Alan Robertson <al...@un...> wrote: <snip> > > What I really meant to say was that *automatic* process migration was > anathema. Sorry to have been imprecise. It has too many people in charge. > *Perfect* manual (or under HA system control) process migration could be > nice... As I understand it, process migration on failure is an oxymoron... > You can only migrate a live process. A dead process can only be restarted. "Automatic" by whom? If the HA cluster manager has a variety of conditions under which it would move, it could choose to move. Note that I cannot think of a good reason why it would, but there is no reason that the HA decision making couldn't have this power. At least, conceptually speaking. It is however, one more thing to go wrong. > > TCP sockets have lots of kernel state to move across... Serial ports and > other local devices can't be migrated - but the corresponding service may be > restartable... I don't know how hard it is to move a TCP socket across the > net, but it doesn't sound especially easy... I apologise for not having references at hand, I've seen work being done on or proposed that does live mirroring of processes. This is orthogonal to SSI but did touch on discussions of mirroring sockets, although it also required the application process take explicit checkpoint-like actions. Mirroring the sockets depends on the interconnect, and being able to handle in duplicate all of the messages. I will attempt to search my archives. > > Other barriers would be moving the whole set of connected processes from one > machine to another... Right now, HA services have a > start/stop/status/monitor kind of paradigm associated with them. Adding a > "migrate" option to them would be an interesting way of approaching it... HACMP on AIX essentially has a migrate paradigm, well, it is a 'move', where you tell HACMP to pick another node for a resource group. This, under the covers, becomes a stop + start to the resources. Usually used when a node is to be taken down gracefully, is sometimes also used for load balancing if a node has multiple unrelated resource groups. The admin initiaites it via a single command, but HACMP does all the work and figures out the specific actions to take to get it to happen. I don't see having process migration to be any different at the admin's view, but the implementation would differ based on the thoroughness of the underlying support. One can imagine the stop + start actions being done in deeper layers of the OS as you more deeply integrate seamless process migration deeper and deeper into your kernel and associate it with full SSI... Kind of like peeling an onion I guess :) IMHO, you are also accepting that you are burying deeper complexity integrated into the lower layers of your system, as you transparently migrate memory, sockets, swap space, etc. > > [snip] > > > If a process migration solution can be achieved that would allow an HA > cluster > > administrator to move the services with no external impact, we will have > mad > > a major improvement in the operation and administration of HA clusters. > > Provided that it is *extremely* reliable. Diagnosing and recovering from > failures caused by imperfect movement would not make the admin's life easier > ;-) This means recovering from errors which might occur during the > movement. See above. More function more deeply hidden makes problem determination more and more difficult. > > The most probable cause of system failure is human error. Anything which > makes the human less likely to err or perform manual recovery is good. > Anything which might make them more likely to make a mistake or perform > manual recovery is bad. > > One of the properties of fully transparent clusters is that it makes it less > clear what is going on where. I would speculate that this would make > administrator errors more likely. > > -- Alan Robertson > al...@un... Peter ===== These have been the opinions of: Peter R. Badovinatz -- (503)578-5530 (TL 775) wo...@us.../tab...@ya... and in no way should be construed as official opinion of IBM, Corp. __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/ |