Thread: [SSI-users] frozen cluster when process migrating?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

hi to all

i have a small cluster in early exploitation phase...
(5 machines : 1 Dell Xeon dual Proc, 1Go RAM and 4 PIII Dell dual 
Processor  ,512 Mo RAM) (sure the nodes are not very performing nodes... 
but i have a big performance problem)

here are the symptoms..i tested migrations with some little oggenc 
processes converting .mp3 files in .ogg files... with such a benchmark i 
don't have any problem... little mp32ogg process  are migrated and are 
efficienly loadleveled..

BUT..
when running some big physical modelisation code of some of my 
researchers...(one process of 350Mb in RAM, computing many days)
sometimes when the only process is migrated between nodes, the cluster 
as a whole is completely frozen... i mean  during a very long time we 
can't type commands anymore such as "ps" "top" "w"  .. sometimes during 
10-15mn we can't make anything on the cluster

it seems that this migrating process , spends its time to migrate 
between nodes...
then after 10,15,20 mn...things are going better, the process come back 
to the init home node... until the next migration phase where things 
will come worse

so ... what's the problem?
is it due, as i think, to migration problem?
is it due to a lack of RAM? is it due to the process size in RAM?
is it due to uncontrolled conditions of migration? i have the feeling 
that the process don't succeed to stay on a better node, and migrates 
during minutes from node to node..

is there a solution to avoid a frozen cluster...sure i can stop the 
loadlevelling on a node (loadlevel -n 1 off) and there will not be 
problem anymore! (but that isn't a cluster anymore)

has anybody noticed similar freeze problems?...
any help, or informations are welcome

ML

-- 
      Maurice Libes
Tel : +33 (04) 91 82 93 25            Centre d'Oceanologie de Marseille
Fax : +33 (04) 91 82 65 48            UMS2196 CNRS- Campus de Luminy, 
Case 901
mailto:mau...@co...  F-13288 Marseille cedex 9
Annuaire : http://annuaire.univ-aix.fr/showuser.php?uid=libes

Thread: [SSI-users] frozen cluster when process migrating?

ssic-linux-users