Thread: [SSI-users] frozen cluster when process migrating?
Brought to you by:
brucewalker,
rogertsang
From: Maurice L. <Mau...@co...> - 2005-10-31 09:31:11
Attachments:
smime.p7s
|
hi to all i have a small cluster in early exploitation phase... (5 machines : 1 Dell Xeon dual Proc, 1Go RAM and 4 PIII Dell dual Processor ,512 Mo RAM) (sure the nodes are not very performing nodes... but i have a big performance problem) here are the symptoms..i tested migrations with some little oggenc processes converting .mp3 files in .ogg files... with such a benchmark i don't have any problem... little mp32ogg process are migrated and are efficienly loadleveled.. BUT.. when running some big physical modelisation code of some of my researchers...(one process of 350Mb in RAM, computing many days) sometimes when the only process is migrated between nodes, the cluster as a whole is completely frozen... i mean during a very long time we can't type commands anymore such as "ps" "top" "w" .. sometimes during 10-15mn we can't make anything on the cluster it seems that this migrating process , spends its time to migrate between nodes... then after 10,15,20 mn...things are going better, the process come back to the init home node... until the next migration phase where things will come worse so ... what's the problem? is it due, as i think, to migration problem? is it due to a lack of RAM? is it due to the process size in RAM? is it due to uncontrolled conditions of migration? i have the feeling that the process don't succeed to stay on a better node, and migrates during minutes from node to node.. is there a solution to avoid a frozen cluster...sure i can stop the loadlevelling on a node (loadlevel -n 1 off) and there will not be problem anymore! (but that isn't a cluster anymore) has anybody noticed similar freeze problems?... any help, or informations are welcome ML -- Maurice Libes Tel : +33 (04) 91 82 93 25 Centre d'Oceanologie de Marseille Fax : +33 (04) 91 82 65 48 UMS2196 CNRS- Campus de Luminy, Case 901 mailto:mau...@co... F-13288 Marseille cedex 9 Annuaire : http://annuaire.univ-aix.fr/showuser.php?uid=libes |
From: John H. <john@Calva.COM> - 2005-10-31 09:42:24
|
Maurice Libes wrote: > hi to all > > i have a small cluster in early exploitation phase... What version of OpenSSI? > (5 machines : 1 Dell Xeon dual Proc, 1Go RAM and 4 PIII Dell dual > Processor ,512 Mo RAM) (sure the nodes are not very performing > nodes... but i have a big performance problem) What are your cpu speeds? > > [...] > > BUT.. > when running some big physical modelisation code of some of my > researchers...(one process of 350Mb in RAM, computing many days) > sometimes when the only process is migrated between nodes, the cluster > as a whole is completely frozen... i mean during a very long time we > can't type commands anymore such as "ps" "top" "w" .. sometimes > during 10-15mn we can't make anything on the cluster And what is your cluster interconnect? [...] > any help, or informations are welcome What do you have in /proc/cluster/loadlog? |
From: Maurice L. <Mau...@co...> - 2005-11-02 15:39:10
Attachments:
smime.p7s
|
John Hughes wrote: > Maurice Libes wrote: > >> hi to all >> >> i have a small cluster in early exploitation phase... > > > What version of OpenSSI? sorry i omit these informations: * it 's on openSSI 1.2 on debian Sarge > >> (5 machines : 1 Dell Xeon dual Proc, 1Go RAM and 4 PIII Dell dual >> Processor ,512 Mo RAM) (sure the nodes are not very performing >> nodes... but i have a big performance problem) > > > What are your cpu speeds? * Dell precision 260 : Xeon dual Proc 2.4 Ghz, 1Gb RAM * 4 Pentium III : Dual Pro Pentium III Coppermine 1Ghz , 512 Mb RAM yes..i have to buy some better machines :-) but even if it exhibits poor performances, i would hope that a process could find the better node, and *stay* computing on it at "t" time if the OS conditions are stable (even a if it's a poor machine) ... but what it "seems" to me, is that this process was changing from nodes to nodes too much times...(i saw it changing from node #2 to #3 to "#n" during some minutes) is there a way to have a "nmig" counter? in order to see if the process spent its time to migrate? ...] >> >> BUT.. >> when running some big physical modelisation code of some of my >> researchers...(one process of 350Mb in RAM, computing many days) >> sometimes when the only process is migrated between nodes, the cluster >> as a whole is completely frozen... i mean during a very long time we >> can't type commands anymore such as "ps" "top" "w" .. sometimes >> during 10-15mn we can't make anything on the cluster > > > And what is your cluster interconnect? * Gigabit 8ports switch NetGear * Gigabit NIC cards on every nodes 3Com 3C2000 > > What do you have in /proc/cluster/loadlog? here is an example of my /proc/cluster/loadlog cat /proc/cluster/loadlog 1130941021 execll:pid 230432(/bin/chmod) <- node 3 mem 105236 my load 77 node3 load 175 1130941022 loadlb:pid 297390(symphonie) <- node 3 mem 77152 my load 97 node3 load 168 1130941022 loadbl:pid 230405(oggenc) -> node 1 mem 77140 my load 159 node1 load 77 1130941023 loadbl:pid 230412(oggenc) -> node 1 mem 78104 my load 154 node1 load 72 1130941084 loadbl:pid 297390(symphonie) -> node 1 mem 74868 my load 104 node1 load 54 1130941113 loadlb:pid 230405(oggenc) <- node 1 mem 215660 my load 16 node1 load 74 1130941121 loadlb:pid 297390(symphonie) <- node 1 mem 70756 my load 68 node1 load 56 1130941121 loadbl:pid 230405(oggenc) -> node 1 mem 70676 my load 102 node1 load 58 1130941123 loadbl:pid 230404(mpg321) -> node 4 mem 71704 my load 98 node4 load 87 1130941155 loadbl:pid 297390(symphonie) -> node 3 mem 71560 my load 95 node3 load 55 1130941798 rexec :pid 108794(/usr/bin/uptime) <- node 1 mem 216864 my load 0 node1 load 52 1130941798 rexec :pid 108797(/usr/bin/awk) <- node 1 mem 216852 my load 0 node1 load 51 1130941798 rexec :pid 108799(/usr/bin/free) <- node 1 mem 216848 my load 0 node1 load 51 1130941798 rexec :pid 108804(/usr/bin/free) <- node 1 mem 216848 my load 0 node1 load 51 1130941799 rexec :pid 108840(/sbin/ifconfig) <- node 1 mem 216848 my load 0 node1 load 51 1130941799 rexec :pid 108849(/sbin/ifconfig) <- node 1 mem 216848 my load 0 node1 load 51 1130941799 rexec :pid 108852(/sbin/ifconfig) <- node 1 mem 216848 my load 0 node1 load 51 1130941799 rexec :pid 108855(/sbin/ifconfig) <- node 1 mem 216844 my load 0 node1 load 51 1130941801 rexec :pid 164887(/usr/bin/uptime) -> node 1 mem 215032 my load 26 node1 load 48 1130941801 rexec :pid 164890(/usr/bin/awk) -> node 1 mem 215116 my load 26 node1 load 48 1130941801 rexec :pid 164892(/usr/bin/free) -> node 1 mem 214792 my load 33 node1 load 45 1130941801 rexec :pid 164897(/usr/bin/free) -> node 1 mem 214732 my load 33 node1 load 45 1130941802 rexec :pid 165047(/sbin/ifconfig) -> node 1 mem 214412 my load 33 node1 load 45 1130941802 rexec :pid 165055(/sbin/ifconfig) -> node 1 mem 215024 my load 33 node1 load 45 1130941802 rexec :pid 165058(/sbin/ifconfig) -> node 1 mem 214416 my load 50 node1 load 41 1130941802 rexec :pid 165061(/sbin/ifconfig) -> node 1 mem 212024 my load 50 node1 load 41 1130941803 rexec :pid 165079(/usr/bin/uptime) -> node 3 mem 203440 my load 50 node3 load 100 1130941812 loadlb:pid 297390(symphonie) <- node 3 mem 71208 my load 8 node3 load 117 1130941812 rexec :pid 165082(/usr/bin/awk) -> node 3 mem 71220 my load 40 node3 load 117 1130941812 rexec :pid 165084(/usr/bin/free) -> node 3 mem 70948 my load 40 node3 load 117 1130941812 rexec :pid 165089(/usr/bin/free) -> node 3 mem 70912 my load 40 node3 load 117 1130941813 rexec :pid 165147(/sbin/ifconfig) -> node 3 mem 70928 my load 64 node3 load 122 1130941813 rexec :pid 165155(/sbin/ifconfig) -> node 3 mem 71280 my load 64 node3 load 122 1130941813 rexec :pid 165158(/sbin/ifconfig) -> node 3 mem 71276 my load 64 node3 load 122 1130941813 rexec :pid 165161(/sbin/ifconfig) -> node 3 mem 71276 my load 64 node3 load 122 1130941813 rexec :pid 165179(/usr/bin/uptime) -> node 4 mem 70988 my load 98 node4 load 66 1130941813 rexec :pid 165182(/usr/bin/awk) -> node 4 mem 71120 my load 98 node4 load 66 1130941813 rexec :pid 165184(/usr/bin/free) -> node 4 mem 70840 my load 98 node4 load 66 1130941813 rexec :pid 165189(/usr/bin/free) -> node 4 mem 70804 my load 98 node4 load 66 1130941814 rexec :pid 165265(/sbin/ifconfig) -> node 4 mem 70612 my load 101 node4 load 65 1130941814 rexec :pid 165273(/sbin/ifconfig) -> node 4 mem 71080 my load 101 node4 load 65 1130941814 rexec :pid 165276(/sbin/ifconfig) -> node 4 mem 71080 my load 101 node4 load 65 1130941815 rexec :pid 165279(/sbin/ifconfig) -> node 4 mem 71072 my load 101 node4 load 65 1130941815 rexec :pid 230890(/usr/bin/uptime) <- node 3 mem 70932 my load 42 node3 load 114 1130941815 rexec :pid 230893(/usr/bin/awk) <- node 3 mem 70932 my load 42 node3 load 114 1130941815 rexec :pid 230895(/usr/bin/free) <- node 3 mem 70932 my load 42 node3 load 114 1130941815 rexec :pid 230900(/usr/bin/free) <- node 3 mem 70928 my load 51 node3 load 108 1130941816 rexec :pid 298841(/usr/bin/uptime) <- node 4 mem 71040 my load 51 node4 load 62 1130941816 rexec :pid 298844(/usr/bin/awk) <- node 4 mem 70280 my load 51 node4 load 62 1130941816 rexec :pid 298846(/usr/bin/free) <- node 4 mem 70216 my load 51 node4 load 62 1130941816 rexec :pid 298851(/usr/bin/free) <- node 4 mem 70840 my load 51 node4 load 62 1130941816 rexec :pid 230948(/sbin/ifconfig) <- node 3 mem 70756 my load 51 node3 load 108 1130941816 rexec :pid 230956(/sbin/ifconfig) <- node 3 mem 70684 my load 51 node3 load 108 1130941816 rexec :pid 230959(/sbin/ifconfig) <- node 3 mem 72488 my load 51 node3 load 108 1130941816 rexec :pid 230962(/sbin/ifconfig) <- node 3 mem 72488 my load 51 node3 load 108 1130941816 rexec :pid 298891(/sbin/ifconfig) <- node 4 mem 72488 my load 60 node4 load 61 1130941816 rexec :pid 298899(/sbin/ifconfig) <- node 4 mem 72488 my load 60 node4 load 61 1130941816 rexec :pid 298902(/sbin/ifconfig) <- node 4 mem 72488 my load 60 node4 load 61 1130941816 rexec :pid 298905(/sbin/ifconfig) <- node 4 mem 72488 my load 60 node4 load 61 1130941849 rexec :pid 108894(/bin/cat) <- node 1 mem 72244 my load 47 node1 load 40 1130941849 rexec :pid 108896(/usr/bin/free) <- node 1 mem 72244 my load 47 node1 load 49 1130941913 rexec :pid 108932(/bin/cat) <- node 1 mem 72084 my load 47 node1 load 41 1130941913 rexec :pid 108934(/usr/bin/free) <- node 1 mem 72084 my load 47 node1 load 41 1130941932 loadbl:pid 297390(symphonie) -> node 3 mem 62416 my load 105 node3 load 51 NB: on the director node (which should be the best node) i have plenty of processes called icssvr_daemon ...taking ~20% of CPU ?? is it normal? what is the role of these daemons ? 69381 root -97 0 0 0 0 S 5.6 0.0 0:09.2 icssvr_daemon 69387 root -97 0 0 0 0 S 5.3 0.0 0:10.61 icssvr_daemon ... NB2: when i type loads , i have sometimes values up to 600 then things become worse and i can't log in, nor monitor the system i really can't understand the causes of such problems... i can't type any command involving processes or memory such as "ps" "w" "top" "free" could it be to due to : - swap? memory? - network? many thanks for your help ML -- Maurice Libes Tel : +33 (04) 91 82 93 25 Centre d'Oceanologie de Marseille Fax : +33 (04) 91 82 65 48 UMS2196 CNRS- Campus de Luminy, Case 901 mailto:mau...@co... F-13288 Marseille cedex 9 Annuaire : http://annuaire.univ-aix.fr/showuser.php?uid=libes |
From: John H. <john@Calva.COM> - 2005-11-02 16:44:45
|
Maurice Libes wrote: > John Hughes wrote: > >> >> What version of OpenSSI? > > > sorry i omit these informations: > * it 's on openSSI 1.2 on debian Sarge > >> What are your cpu speeds? > > > * Dell precision 260 : Xeon dual Proc 2.4 Ghz, 1Gb RAM > * 4 Pentium III : Dual Pro Pentium III Coppermine 1Ghz , 512 Mb RAM Should be fast enough for testing. 512Mb ram should be plenty if you're not running X, and just about enough if you are. > >>> BUT.. >>> when running some big physical modelisation code of some of my >>> researchers...(one process of 350Mb in RAM, computing many days) >>> sometimes when the only process is migrated between nodes, the >>> cluster as a whole is completely frozen... i mean during a very >>> long time we can't type commands anymore such as "ps" "top" "w" .. >>> sometimes during 10-15mn we can't make anything on the cluster >> >> >> And what is your cluster interconnect? > > > * Gigabit 8ports switch NetGear > * Gigabit NIC cards on every nodes 3Com 3C2000 Good. >> What do you have in /proc/cluster/loadlog? > > > here is an example of my /proc/cluster/loadlog > > cat /proc/cluster/loadlog > 1130941021 execll:pid 230432(/bin/chmod) <- node 3 mem 105236 my load > 77 node3 load 175 > 1130941022 loadlb:pid 297390(symphonie) <- node 3 mem 77152 my load 97 > node3 load 168 > 1130941022 loadbl:pid 230405(oggenc) -> node 1 mem 77140 my load 159 > node1 load 77 > 1130941023 loadbl:pid 230412(oggenc) -> node 1 mem 78104 my load 154 > node1 load 72 > 1130941084 loadbl:pid 297390(symphonie) -> node 1 mem 74868 my load > 104 node1 load 54 > 1130941113 loadlb:pid 230405(oggenc) <- node 1 mem 215660 my load 16 > node1 load 74 > 1130941121 loadlb:pid 297390(symphonie) <- node 1 mem 70756 my load 68 > node1 load 56 > 1130941121 loadbl:pid 230405(oggenc) -> node 1 mem 70676 my load 102 > node1 load 58 > 1130941123 loadbl:pid 230404(mpg321) -> node 4 mem 71704 my load 98 > node4 load 87 > 1130941155 loadbl:pid 297390(symphonie) -> node 3 mem 71560 my load 95 > node3 load 55 So we see some ogg encoding stuff being moved around. Where is the "physical modelisation code?" > NB: on the director node (which should be the best node) i have plenty > of processes called icssvr_daemon ...taking ~20% of CPU ?? is it normal? > what is the role of these daemons ? > > 69381 root -97 0 0 0 0 S 5.6 0.0 0:09.2 icssvr_daemon > 69387 root -97 0 0 0 0 S 5.3 0.0 0:10.61 icssvr_daemon > ... Inter-node communications. > > NB2: when i type loads , i have sometimes values up to 600 > then things become worse and i can't log in, nor monitor the system > > i really can't understand the causes of such problems... > i can't type any command involving processes or memory > such as "ps" "w" "top" "free" > > could it be to due to : > - swap? memory? > - network? Do you have any swap space on the non-initnodes? By the way - what makes you think the problem is due to migration? |
From: libes <li...@co...> - 2005-11-02 21:34:59
Attachments:
smime.p7s
|
>> >> cat /proc/cluster/loadlog >> 1130941021 execll:pid 230432(/bin/chmod) <- node 3 mem 105236 my load >> 77 node3 load 175 >> 1130941022 loadlb:pid 297390(symphonie) <- node 3 mem 77152 my load 97 >> node3 load 168 node1 load 72 >> 1130941084 loadbl:pid 297390(symphonie) -> node 1 mem 74868 my load >> 104 node1 load 54 >> 1130941121 loadlb:pid 297390(symphonie) <- node 1 mem 70756 my load 68 >> node1 load 56 > node4 load 87 >> 1130941155 loadbl:pid 297390(symphonie) -> node 3 mem 71560 my load 95 >> node3 load 55 > > > So we see some ogg encoding stuff being moved around. > > Where is the "physical modelisation code?" it is the program called symphonie.. but at the time where i type cat /proc/cluster/loadlevel there wasn't yet problems do i have to send you /proc/cluster/loadlevel when problems occur? >> i really can't understand the causes of such problems... >> i can't type any command involving processes or memory >> such as "ps" "w" "top" "free" >> >> could it be to due to : >> - swap? memory? >> - network? > > > Do you have any swap space on the non-initnodes? none swap on other nodes! do i have to create a swap space on each node? i believed the one on the init node was/could be sufficient > > By the way - what makes you think the problem is due to migration? > why due to migration? that's a feeling and i can't prove it.. that's why i ask for help... - when the cluster does nothing (no computing processes) there's no problem (it's better to say it) - when i launch a little benchmark with 10 mp3 encoder processes, there's no problem (loadleveling + migration are ok) - and when users launch 1 or 2 physical model... all begins well then, i remark some big time of latency..i correlate this latency to processes changing nodes, then suddently ...problems begin to come : one cannot login on any node, we can't type any command anymore the problem is that i really can't monitor nothing on the system since i can't type any monitoring commands such as ps, free, top, (there's no answer from the system) etc.. in these cases i can't understand in which state is the system? but during this time, the computing processes continue to run...sometimes after hours i can take again the control of the system (for example when rebooting one node) so..any help, ideas to test are welcome ML |
From: John H. <john@Calva.COM> - 2005-11-03 09:26:09
|
libes wrote: >> Where is the "physical modelisation code?" > > > it is the program called symphonie.. > but at the time where i type cat /proc/cluster/loadlevel there > wasn't yet problems > do i have to send you /proc/cluster/loadlevel when problems occur? What would be interesting is the /proc/cluster/loadlog from the nodes running the "symphonie" application. > > >> Do you have any swap space on the non-initnodes? > > none swap on other nodes! > do i have to create a swap space on each node? > i believed the one on the init node was/could be sufficient Nope, you need swap on all nodes. Your non-initnodes have 512M, if the "symphonie" thing is big then it will have horrid problems. >> By the way - what makes you think the problem is due to migration? > > why due to migration? that's a feeling and i can't prove it.. > that's why i ask for help... > > - when the cluster does nothing (no computing processes) there's no > problem (it's better to say it) > - when i launch a little benchmark with 10 mp3 encoder processes, > there's no problem (loadleveling + migration are ok) > - and when users launch 1 or 2 physical model... > all begins well then, i remark some big time of latency..i > correlate this latency to processes changing nodes, then suddently > ...problems begin to come : one cannot login on any node, we can't > type any command anymore > > the problem is that i really can't monitor nothing on the system > since i can't type any monitoring commands such as ps, free, top, > (there's no answer from the system) etc.. > in these cases i can't understand in which state is the system? > > but during this time, the computing processes continue to > run...sometimes after hours i can take again the control of the system > (for example when rebooting one node) > > so..any help, ideas to test are welcome The first thing I'd try is getting some swap on all nodes. If that helps look at getting some more memory. AFAIK OpenSSI doesn't (yet) migrate processes based on memory usage, just CPU, so it could try migrating a HUGE process to a node without enough memory if that node wasn't doing much work. |
From: Maurice L. <Mau...@co...> - 2005-11-03 17:03:41
Attachments:
smime.p7s
|
John Hughes wrote: >> >> it is the program called symphonie.. >> but at the time where i type cat /proc/cluster/loadlevel there >> wasn't yet problems >> do i have to send you /proc/cluster/loadlevel when problems occur? > here is an example... - i can't log anymore on node1 - i can't type anymore some commands (top, ps, w...) - i just can type cat /proc/cluster/loadlog here below ... i have added some swap space on each node loads seems to be normal... but the system is blocked (it means i can't type any commands and i can't login) root@comclust2:~# loads 1: 52 * 2: 26 3: 26 4: 76 root@comclust2:~# onnode 1 cat /proc/cluster/loadlog 1131033604 rexec :pid 103402(/sbin/ifconfig) -> node 2 mem 14212 my load 48 node2 load 43 1131033604 rexec :pid 301490(/usr/bin/uptime) <- node 4 mem 14496 my load 27 node4 load 26 1131033604 rexec :pid 103414(/sbin/ifconfig) -> node 2 mem 14252 my load 48 node2 load 44 1131033604 rexec :pid 103417(/sbin/ifconfig) -> node 2 mem 14720 my load 48 node2 load 44 1131033604 rexec :pid 301493(/usr/bin/awk) <- node 4 mem 14688 my load 27 node4 load 26 1131033604 rexec :pid 103420(/sbin/ifconfig) -> node 2 mem 14768 my load 48 node2 load 44 1131033604 rexec :pid 301495(/usr/bin/free) <- node 4 mem 14764 my load 27 node4 load 26 1131033605 rexec :pid 301500(/usr/bin/free) <- node 4 mem 14092 my load 27 node4 load 26 1131033605 rexec :pid 301586(/sbin/ifconfig) <- node 4 mem 16080 my load 26 node4 load 32 1131033606 rexec :pid 301594(/sbin/ifconfig) <- node 4 mem 16080 my load 26 node4 load 32 1131033606 rexec :pid 301597(/sbin/ifconfig) <- node 4 mem 16080 my load 26 node4 load 32 1131033606 rexec :pid 301600(/sbin/ifconfig) <- node 4 mem 16080 my load 26 node4 load 32 1131034501 rexec :pid 103853(/usr/bin/uptime) -> node 3 mem 30216 my load 53 node3 load 26 1131034501 rexec :pid 103856(/usr/bin/awk) -> node 3 mem 30304 my load 53 node3 load 26 1131034501 rexec :pid 103858(/usr/bin/free) -> node 3 mem 30144 my load 53 node3 load 26 1131034501 rexec :pid 103863(/usr/bin/free) -> node 3 mem 30212 my load 53 node3 load 26 1131034502 rexec :pid 198466(/usr/bin/uptime) <- node 3 mem 30284 my load 25 node3 load 26 1131034502 rexec :pid 198469(/usr/bin/awk) <- node 3 mem 30232 my load 25 node3 load 26 1131034502 rexec :pid 198471(/usr/bin/free) <- node 3 mem 30224 my load 25 node3 load 26 1131034502 rexec :pid 198476(/usr/bin/free) <- node 3 mem 30296 my load 25 node3 load 26 1131034502 rexec :pid 103921(/sbin/ifconfig) -> node 3 mem 30200 my load 53 node3 load 26 1131034502 rexec :pid 103929(/sbin/ifconfig) -> node 3 mem 30228 my load 53 node3 load 26 1131034502 rexec :pid 103932(/sbin/ifconfig) -> node 3 mem 30228 my load 53 node3 load 26 1131034502 rexec :pid 103935(/sbin/ifconfig) -> node 3 mem 30232 my load 53 node3 load 26 1131034502 rexec :pid 103953(/usr/bin/uptime) -> node 4 mem 29820 my load 53 node4 load 26 1131034502 rexec :pid 103956(/usr/bin/awk) -> node 4 mem 29788 my load 53 node4 load 26 1131034502 rexec :pid 103958(/usr/bin/free) -> node 4 mem 29760 my load 53 node4 load 26 1131034502 rexec :pid 103963(/usr/bin/free) -> node 4 mem 29660 my load 53 node4 load 26 1131034503 rexec :pid 198576(/sbin/ifconfig) <- node 3 mem 29812 my load 26 node3 load 40 1131034503 rexec :pid 133162(/usr/bin/uptime) <- node 2 mem 29616 my load 26 node2 load 26 1131034503 rexec :pid 198584(/sbin/ifconfig) <- node 3 mem 29916 my load 26 node3 load 40 1131034503 rexec :pid 133165(/usr/bin/awk) <- node 2 mem 29952 my load 26 node2 load 26 1131034503 rexec :pid 104003(/sbin/ifconfig) -> node 4 mem 29828 my load 53 node4 load 26 1131034503 rexec :pid 198587(/sbin/ifconfig) <- node 3 mem 29560 my load 26 node3 load 40 1131034503 rexec :pid 133167(/usr/bin/free) <- node 2 mem 29340 my load 26 node2 load 26 1131034503 rexec :pid 104011(/sbin/ifconfig) -> node 4 mem 29856 my load 53 node4 load 26 1131034503 rexec :pid 198590(/sbin/ifconfig) <- node 3 mem 29808 my load 26 node3 load 40 1131034503 rexec :pid 133172(/usr/bin/free) <- node 2 mem 29888 my load 26 node2 load 26 1131034503 rexec :pid 104014(/sbin/ifconfig) -> node 4 mem 29988 my load 53 node4 load 26 1131034503 rexec :pid 104017(/sbin/ifconfig) -> node 4 mem 30028 my load 53 node4 load 26 1131034503 rexec :pid 104035(/usr/bin/uptime) -> node 2 mem 29692 my load 53 node2 load 26 1131034503 rexec :pid 104038(/usr/bin/awk) -> node 2 mem 29788 my load 53 node2 load 26 1131034503 rexec :pid 104040(/usr/bin/free) -> node 2 mem 29796 my load 53 node2 load 26 1131034503 rexec :pid 104045(/usr/bin/free) -> node 2 mem 29788 my load 53 node2 load 26 1131034504 rexec :pid 104089(/sbin/ifconfig) -> node 2 mem 29704 my load 50 node2 load 37 1131034504 rexec :pid 104097(/sbin/ifconfig) -> node 2 mem 29824 my load 50 node2 load 37 1131034504 rexec :pid 104100(/sbin/ifconfig) -> node 2 mem 30092 my load 50 node2 load 37 1131034504 rexec :pid 301898(/usr/bin/uptime) <- node 4 mem 30136 my load 27 node4 load 26 1131034504 rexec :pid 104103(/sbin/ifconfig) -> node 2 mem 30184 my load 50 node2 load 37 1131034504 rexec :pid 301901(/usr/bin/awk) <- node 4 mem 29708 my load 27 node4 load 27 1131034504 rexec :pid 133274(/sbin/ifconfig) <- node 2 mem 31272 my load 27 node2 load 37 1131034504 rexec :pid 301903(/usr/bin/free) <- node 4 mem 31268 my load 27 node4 load 27 1131034504 rexec :pid 133282(/sbin/ifconfig) <- node 2 mem 31236 my load 27 node2 load 37 1131034504 rexec :pid 301908(/usr/bin/free) <- node 4 mem 31344 my load 27 node4 load 27 1131034504 rexec :pid 133285(/sbin/ifconfig) <- node 2 mem 31336 my load 27 node2 load 37 1131034504 rexec :pid 133288(/sbin/ifconfig) <- node 2 mem 31340 my load 27 node2 load 37 1131034505 rexec :pid 301998(/sbin/ifconfig) <- node 4 mem 31356 my load 27 node4 load 27 1131034505 rexec :pid 302006(/sbin/ifconfig) <- node 4 mem 31356 my load 27 node4 load 27 1131034505 rexec :pid 302009(/sbin/ifconfig) <- node 4 mem 31356 my load 27 node4 load 27 1131034505 rexec :pid 302012(/sbin/ifconfig) <- node 4 mem 31344 my load 27 node4 load 27 1131035035 loadbl:pid 233907(bio_mars.exe) -> node 2 mem 13368 my load 48 node2 load 51 1131035402 rexec :pid 198903(/usr/bin/uptime) <- node 3 mem 13012 my load 21 node3 load 26 1131035402 rexec :pid 133603(/usr/bin/uptime) <- node 2 mem 12912 my load 24 node2 load 35 1131035725 rexec :pid 133617(/bin/cat) <- node 2 mem 13140 my load 41 node2 load 31 > > Nope, you need swap on all nodes. Your non-initnodes have 512M, if the > "symphonie" thing is big then it will have horrid problems. > >>> By the way - what makes you think the problem is due to migration? >> >> > > The first thing I'd try is getting some swap on all nodes. > done.... i created some swap space on all non init nodes and ... it's the same... onall swapon -s (node 1) Filename Type Size Used Priority /dev/scsi/host1/bus0/target0/lun0/part3 partition 4096564 18696 -1 (node 2) Filename Type Size Used Priority /dev/scsi/host0/bus0/target0/lun0/part3 partition 522104 0 -1 (node 3) Filename Type Size Used Priority /dev/scsi/host0/bus0/target0/lun0/part5 partition 1469908 0 -1 (node 4) Filename Type Size Used Priority /dev/scsi/host0/bus0/target0/lun0/part5 partition 1469908 0 -1 root@comclust2:~# => right now.. there has been a freeze of the system during 5mn... program bio_mars.exe (here below) was on node 1.... when all gets normal again (# 5mn later) i see the program bio_mars.exe on node 3... that 's why i think the problem could come from migration (and right now all was frozen approximatively 4-5 mns... sometimes it is during 15-20mn) PID NODE USER PR NI VIRT RES SHR S %CPU TIME+ COMMAND 233907 3 faure 16 0 335m 335m 516 R 93.2 120:20.04 bio_mars.exe here are some logs... $ onnode 3 cat /proc/cluster/loadlog |grep bio 1131036010 loadlb:pid 233907(bio_mars.exe) <- node 2 mem 83256 my load 5 node2 load 34 1131036302 loadbl:pid 233907(bio_mars.exe) -> node 2 mem 82080 my load 125 node2 load 51 1131036667 loadlb:pid 233907(bio_mars.exe) <- node 2 mem 84532 my load 0 node2 load 26 what can we see on these logs? can they explain my problem? is there any possibility that there was a flip flop between nodes 2 and 3? > If that helps look at getting some more memory. > > AFAIK OpenSSI doesn't (yet) migrate processes based on memory usage, > just CPU, so it could try migrating a HUGE process to a node without > enough memory if that node wasn't doing much work. > yes it can explains... but it is strange that i am alone in this situation...with huge latency times ? i am upset not to be able to type any command and to monitor the system and not to find the solution! don't you think that i could miss something or have done some mistakes on my debian installation? -- Maurice Libes Tel : +33 (04) 91 82 93 25 Centre d'Oceanologie de Marseille Fax : +33 (04) 91 82 65 48 UMS2196 CNRS- Campus de Luminy, Case 901 mailto:mau...@co... F-13288 Marseille cedex 9 Annuaire : http://annuaire.univ-aix.fr/showuser.php?uid=libes |
From: John H. <john@Calva.COM> - 2005-11-04 10:47:26
|
Attempting to put it all together (and assuming your clocks are in sync - you are running NTP on all nodes?): 0 node2 loadbl:pid 233907(bio_mars.exe) -> node 3 mem 92328 my load 104 node3 load 75 194 node3 loadlb:pid 233907(bio_mars.exe) <- node 2 mem 84532 my load 0 node2 load 26 729 node3 loadbl:pid 233907(bio_mars.exe) -> node 2 mem 82428 my load 125 node2 load 51 918 node2 loadlb:pid 233907(bio_mars.exe) <- node 3 mem 90684 my load 0 node3 load 51 940 node2 loadbl:pid 233907(bio_mars.exe) -> node 1 mem 92496 my load 96 node1 load 11 948 node1 loadlb:pid 233907(bio_mars.exe) <- node 2 mem 13640 my load 0 node2 load 68 1027 node1 loadbl:pid 233907(bio_mars.exe) -> node 2 mem 13432 my load 54 node2 load 72 1st it migrates from node2 to node3, takes 194 seconds. Then 8 minutes later it migrates from node3 back to node2, taking 189 seconds. Later on it migrates from node2 to node1, taking 8 seconds. And finaly it migrates from node1 to node 2, and everything seems to go to hell. Seems like there is some problem between nodes 2 and 3, why is migration so slow? |
From: Maurice L. <Mau...@co...> - 2005-11-07 09:10:42
Attachments:
smime.p7s
|
John Hughes wrote: > Attempting to put it all together (and assuming your clocks are in sync > - you are running NTP on all nodes?): > > 0 node2 loadbl:pid 233907(bio_mars.exe) -> node 3 mem 92328 my > load 104 node3 load 75 > 194 node3 loadlb:pid 233907(bio_mars.exe) <- node 2 mem 84532 my > load 0 node2 load 26 > > 729 node3 loadbl:pid 233907(bio_mars.exe) -> node 2 mem 82428 my > load 125 node2 load 51 > 918 node2 loadlb:pid 233907(bio_mars.exe) <- node 3 mem 90684 my > load 0 node3 load 51 > > 940 node2 loadbl:pid 233907(bio_mars.exe) -> node 1 mem 92496 my > load 96 node1 load 11 > 948 node1 loadlb:pid 233907(bio_mars.exe) <- node 2 mem 13640 my > load 0 node2 load 68 > > 1027 node1 loadbl:pid 233907(bio_mars.exe) -> node 2 mem 13432 my > load 54 node2 load 72 > > 1st it migrates from node2 to node3, takes 194 seconds. > Then 8 minutes later it migrates from node3 back to node2, taking 189 > seconds. > Later on it migrates from node2 to node1, taking 8 seconds. > And finaly it migrates from node1 to node 2, and everything seems to go > to hell. > > Seems like there is some problem between nodes 2 and 3, why is migration > so slow? > thanks for your help and analyze.. i really don't know why there is such a long time for the process migration between some of my nodes (from node 1 towards nodes 2 3 4) (node 1 is a recent machine Dell precision 2.8Ghz 1Gb RAM) (nodes 2 3 4 are old Dell PIII 1Ghz 512Mb RAM) i can now reproduce the problem by hand.. i launch a big process (bio_mars.exe) on node 1... if i migrate it on node 2 3 or 4 (migrate 2 <pid>) it freeze immediately the cluster (i cant't type any command anymore from procps package ps, top, w etc.... loadlog shows that the process is goiing to to node 2 3 or 4...but on nodes 2 3 4 there is no trace for the coming of this process if i stop and reboot node 2 3 4 this process come back on node 1 and continue.. and i see in the /proc/luster/loadlog "failed to node 4 error -22" (surely because i reboot the node) 1131286970 mig :pid 69224(bio_mars2.exe) -> node 4 mem 28536 my load 41 node4 load 26 1131287425 migrate pid 69224(bio_mars2.exe) failed to node 4 error -22 note that there's no problem with little benchmark a big loop with awk, or plenty of mp32ogg processes so? what can be the problem? hardware incompatibilities? (scsi?) wrong installation on debian sarge? bug in procps? on debian? why some process can't succeed to migrate and are freezing all the nodes? thanks for your help and advice ML > -- Maurice Libes Tel : +33 (04) 91 82 93 25 Centre d'Oceanologie de Marseille Fax : +33 (04) 91 82 65 48 UMS2196 CNRS- Campus de Luminy, Case 901 mailto:mau...@co... F-13288 Marseille cedex 9 Annuaire : http://annuaire.univ-aix.fr/showuser.php?uid=libes |
From: Mulyadi S. <mul...@gm...> - 2005-11-08 04:26:05
|
Dear Maurice > thanks for your help and analyze.. > i really don't know why there is such a long time for the process > migration between some of my nodes (from node 1 towards nodes 2 3 4) Previously, you said you were running big application, how "big" is it? can you tell us the size of the application? And how big the virtual size is (+dynamic library+heap). You can use "pmap" to see it > (node 1 is a recent machine Dell precision 2.8Ghz 1Gb RAM) > (nodes 2 3 4 are old Dell PIII 1Ghz 512Mb RAM) On node 1 itself...when you run the process alone (well, along with neccessary daemon and kernel threads of course), how much do you use the RAM usage? it the application swaps, how much swap space it uses? > note that there's no problem with little benchmark a big loop with > awk, or plenty of mp32ogg processes OK here comes my prediction. Page migration is still on the way, but since you said it is "big", they are still "in flight". Note that, when page arrive, they are still need to be allocated first (possibly in blocking style...as alloc_pages() usually does). Maurice, maybe you compare it to your experience with oM? Now you will get clearer picture of the difference between these two (oM and openSSI). Since openSSI implement full process image migration, be prepared to watch longer interval during process migration. The term "longer" here is relative, it could be a bit, or waaayyyy longer. My suggestion for openSSI developers is to implement something differential page migration based on remote demand paging. page is migrated on demand...and only those which is recently dirtied. I got lost when tracing the internal codes of openSSI handling these stuffs, so any hints are welcome here regards Mulyadi |
From: Maurice L. <Mau...@co...> - 2005-11-08 10:36:55
Attachments:
smime.p7s
|
Mulyadi Santosa wrote: > Dear Maurice > > >>thanks for your help and analyze.. >>i really don't know why there is such a long time for the process >>migration between some of my nodes (from node 1 towards nodes 2 3 4) > > > Previously, you said you were running big application, how "big" is it? > can you tell us the size of the application? And how big the virtual > size is (+dynamic library+heap). You can use "pmap" to see it > here are two of these computing processes (symphonie, bio_mars.exe) and occupied RAM ... 350Mb Tasks: 130 total, 3 running, 127 sleeping, 0 stopped, 0 zombie PID NODE USER PR NI VIRT RES SHR S %CPU TIME+ COMMAND 198028 3 gatti 25 0 328m 321m 13m R 99.4 1302:32 symphonie 264695 1 faure 25 0 353m 251m 23m R 98.7 1512:52 bio_mars.exe >>(node 1 is a recent machine Dell precision 2.8Ghz 1Gb RAM) >>(nodes 2 3 4 are old Dell PIII 1Ghz 512Mb RAM) > > > On node 1 itself...when you run the process alone (well, along with > neccessary daemon and kernel threads of course), how much do you use > the RAM usage? it the application swaps, how much swap space it uses? > here is the "free" command output on node 1 root@comclust5:~# free total used free shared buffers cached 1025816 1012828 12988 0 5660 400496 -/+ buffers/cache: 606672 419144 Swap: 4096564 184556 3912008 i create a swap space on each node, but swap is not very used on each node root@comclust5:~# onnode 2 free total used free shared buffers cached Mem: 509788 296508 213280 0 160 47588 -/+ buffers/cache: 248760 261028 Swap: 522104 1168 520936 root@comclust5:~# onnode 3 free total used free shared buffers cached Mem: 509788 504120 5668 0 160 108696 -/+ buffers/cache: 395264 114524 Swap: 1469908 19204 1450704 root@comclust5:~# onnode 4 free total used free shared buffers cached Mem: 1026356 434208 592148 0 160 231624 -/+ buffers/cache: 202424 823932 Swap: 1469908 0 1469908 > >>note that there's no problem with little benchmark a big loop with >>awk, or plenty of mp32ogg processes > > > OK here comes my prediction. Page migration is still on the way, but > since you said it is "big", they are still "in flight". Note that, when > page arrive, they are still need to be allocated first (possibly in > blocking style...as alloc_pages() usually does). > > Maurice, maybe you compare it to your experience with oM? Now you will > get clearer picture of the difference between these two (oM and > openSSI). Since openSSI implement full process image migration, be > prepared to watch longer interval during process migration. The term > "longer" here is relative, it could be a bit, or waaayyyy longer. yes i noticed that in normal conditions the process migration time was longer than in oM... but in my case, one can not say it is long or longer... it simply freezes all following commands (no more top, ps, w, command during 10 20 30 minutes... it is not "in flight" ;-)) i can now reproduce my problem... but i still dont know how to solve it... i) when my big processes are migrating from nodes 2 3 4 to node 1 (init node) there is no problem ii) when processes are migrating among node 2 3 4 .. there is no problem but, iii) when of one these processes is migrating from node 1 towards nodes 2 3 4... the problem occurs...the process seems to leave node 1 (i see a log in /proc/cluster/loadlog) ... but never reaches the destination node (no log in /proc/cluster/loadlog of the destination node)... i must reboot the destination node in order to get back to the nominal conditions (process comes back or stay on node 1) may be a network problem? but why? since my 5 NIC cards are new (3Com 3C2000 gigabit with sklin98 drivers on Debian) as my switchs Netgear 8port gigabit any ideas? ML > > My suggestion for openSSI developers is to implement something > differential page migration based on remote demand paging. page is > migrated on demand...and only those which is recently dirtied. I got > lost when tracing the internal codes of openSSI handling these stuffs, > so any hints are welcome here > > regards > > Mulyadi -- Maurice Libes Tel : +33 (04) 91 82 93 25 Centre d'Oceanologie de Marseille Fax : +33 (04) 91 82 65 48 UMS2196 CNRS- Campus de Luminy, Case 901 mailto:mau...@co... F-13288 Marseille cedex 9 Annuaire : http://annuaire.univ-aix.fr/showuser.php?uid=libes |
From: Mulyadi S. <mul...@gm...> - 2005-11-08 17:13:54
|
Dear Maurice 350 MB virtual size....on gig-E card, transferring it should be fast, especially on NAPI and/or polling mode...even coupled with reading swapped out pages, it is still fast realizing the fact swap area isn't not so utilized. > may be a network problem? but why? > since my 5 NIC cards are new (3Com 3C2000 gigabit with sklin98 > drivers on Debian) as my switchs Netgear 8port gigabit So this is init-to-non-init migration problem. Maybe you need to consult Laura Ramirez or John Byrne...asking whether there is a problem or bug on migration while taking VM pressure/allocation into account. personally, i suggest you to enable kdb on your openSSI kernel and try switch into kdb console on the frozen node and do backtracing (type "bt" from the prompt). Then we will know where it freeze and possibly why. regards Mulyadi |