Thread: [SSI-users] newbie reliability question -- openmosix user
Brought to you by:
brucewalker,
rogertsang
From: Indraneel M. <ind...@sm...> - 2005-03-31 12:39:33
|
Hi, I am having problems with an openmosix cluster and am thinking of shifting to openssi. In the current setup (OM), nodes hang and bring other nodes down if critical processes (init) have migrated. Does openssi have such problems? What happens to parents when child processes get killed? What happens to children when parent processes get killed? (eg. the node stops responding due to overload or goes out of memory.) Also I face a problem where a node responds to ping, and that's all it does. Is there any timeout that can be configured to consider that a node is dead (or is watchdog the best way)? I believe jobs will get scheduled to the least loaded node. Is there any scheduling such that the most resource intensive tasks (CPU/memory/IO) are placed on different nodes? (eg. round robin scheduling of current jobs sorted by current resource use by each, for hopefully best use of resources) Do IO intensive tasks get migrated? How much is the typical latency for migration? Also, will the 2.6.x release take a long time / are the CVS sources stable enough for a production system? Thanks in advance, Indraneel |
From: Roger T. <rog...@gm...> - 2005-03-31 20:08:29
|
Hi, > bring other nodes down if critical processes (init) have migrated. Does > openssi have such problems? > OpenSSI load-balancing does not migrate init by default, but will failover init when setup properly. > What happens to parents when child processes get killed? What happens Hopefully the same behavior as a child exiting. > to children when parent processes get killed? (eg. the node stops > responding due to overload or goes out of memory.) Also I face a problem Either you get zombies or the parent takes care of destroying its children too. > where a node responds to ping, and that's all it does. Is there any > timeout that can be configured to consider that a node is dead (or is > watchdog the best way)? > Default timeout is 10 seconds in openSSI and is adjustable. If you manage to hang the whole cluster within the timeout period, then watchdog is a good idea. Hope this helps. -Roger |
From: Brian J. W. <Bri...@hp...> - 2005-04-01 03:22:37
|
Indraneel Majumdar wrote: > Also, will the 2.6.x release take a long time / are the CVS sources stable > enough for a production system? OpenSSI 1.9.0 for Debian will be available in another week or two with a 2.6.10-based kernel. It won't be considered stable for production systems until OpenSSI 2.0.0 is released. Regards, Brian |
From: Indraneel M. <ind...@sm...> - 2005-04-01 16:03:14
|
Thanks. On Thu, Mar 31, 2005 at 07:22:01PM -0800, Brian J. Watson wrote: > Indraneel Majumdar wrote: > >Also, will the 2.6.x release take a long time / are the CVS sources stable > >enough for a production system? > > OpenSSI 1.9.0 for Debian will be available in another week or two with a > 2.6.10-based kernel. It won't be considered stable for production > systems until OpenSSI 2.0.0 is released. > > Regards, > > Brian |
From: Indraneel M. <ind...@sm...> - 2005-04-01 16:10:23
|
Thanks On Thu, Mar 31, 2005 at 08:17:33PM -0800, Laura Ramirez wrote: > > > >Do IO intensive tasks get migrated? How much is the typical latency for > >migration? > > It depends how big the process is, how long it takes to bring the pages > over. > So IO intensive jobs might not get migrated even if the node it is on is overloaded? -Indraneel |