Thread: [SSI-devel] [ ssic-linux-Bugs-1019663 ] 'ps --node' in parallel with tar via bash-ll causes nodedown
Brought to you by:
brucewalker,
rogertsang
[SSI-devel] [ ssic-linux-Bugs-1019663 ] 'ps --node' in parallel with tar via bash-ll causes nodedown
From: SourceForge.net <no...@so...> - 2004-08-31 13:29:41
|
Bugs item #1019663, was opened at 2004-08-31 13:29 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1019663&group_id=32541 Category: Process Management Group: None Status: Open Resolution: None Priority: 8 Submitted By: Kishore Sampathkumar (kishoreks) Assigned to: Nobody/Anonymous (nobody) Summary: 'ps --node' in parallel with tar via bash-ll causes nodedown Initial Comment: On a RH9 system with OpenSSI 1.0.0, running "ps -- node <nodenum>" in parallel while multiple processes started via bash-ll are in execution causes <nodenum> to be shutdown. Strangely, doing a "ping <nodenum>" shows that <nodenum> is still reachable via network. However, "cluster" command notices that the additional node is down. node 2 transitions to DOWN state and the node down event is handled by all surviving nodes! The following can reliably reproduce the problem: On a 2-node OpenSSI cluster, assuming the node numbers are 1 and 2, run the following on the init node (node 1): $ /tmp/trytar.sh & $ strace -f -ff -o ps1 ps --node 2 Where /tmp/trytar.sh contains the following: ----- BEGIN: /tmp/trytar.sh ----- for i in 0 1 2 3 4 5 6 7 8 9 do strace -f -ff -o tar$$ bash-ll -c "tar cf tar$$.out /tmp/*.out" > /dev/null 2>&1 & sleep 2; done ----- END: /tmp/trytar.sh ----- ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1019663&group_id=32541 |
[SSI-devel] [ ssic-linux-Bugs-1019663 ] 'ps --node' in parallel with tar via bash-ll causes nodedown
From: SourceForge.net <no...@so...> - 2004-09-08 04:20:13
|
Bugs item #1019663, was opened at 2004-08-31 13:29 Message generated for change (Comment added) made by kishoreks You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1019663&group_id=32541 Category: Process Management Group: None >Status: Closed Resolution: None Priority: 8 Submitted By: Kishore Sampathkumar (kishoreks) >Assigned to: Aneesh Kumar K.V (kvaneesh) Summary: 'ps --node' in parallel with tar via bash-ll causes nodedown Initial Comment: On a RH9 system with OpenSSI 1.0.0, running "ps -- node <nodenum>" in parallel while multiple processes started via bash-ll are in execution causes <nodenum> to be shutdown. Strangely, doing a "ping <nodenum>" shows that <nodenum> is still reachable via network. However, "cluster" command notices that the additional node is down. node 2 transitions to DOWN state and the node down event is handled by all surviving nodes! The following can reliably reproduce the problem: On a 2-node OpenSSI cluster, assuming the node numbers are 1 and 2, run the following on the init node (node 1): $ /tmp/trytar.sh & $ strace -f -ff -o ps1 ps --node 2 Where /tmp/trytar.sh contains the following: ----- BEGIN: /tmp/trytar.sh ----- for i in 0 1 2 3 4 5 6 7 8 9 do strace -f -ff -o tar$$ bash-ll -c "tar cf tar$$.out /tmp/*.out" > /dev/null 2>&1 & sleep 2; done ----- END: /tmp/trytar.sh ----- ---------------------------------------------------------------------- >Comment By: Kishore Sampathkumar (kishoreks) Date: 2004-09-08 04:20 Message: Logged In: YES user_id=857156 The check-in that Laura had done on the OPENSSI-RH-1-0- STABLE in kernel/cluster/ssi/vproc/procfs_subr.c corresponding to a fix for some other problem actually now fixes this problem as well. I checked-out the above file from OPENSSI-RH-1-0-STABLE branch, built a new kernel, and after booting on that, tried the above mentioned test. The test now succeeds. Closing this bug. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1019663&group_id=32541 |
[SSI-devel] [ ssic-linux-Bugs-1019663 ] 'ps --node' in parallel with tar via bash-ll causes nodedown
From: SourceForge.net <no...@so...> - 2004-09-08 06:17:01
|
Bugs item #1019663, was opened at 2004-08-31 13:29 Message generated for change (Settings changed) made by kishoreks You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1019663&group_id=32541 Category: Process Management Group: None Status: Closed Resolution: None Priority: 8 Submitted By: Kishore Sampathkumar (kishoreks) >Assigned to: Kishore Sampathkumar (kishoreks) Summary: 'ps --node' in parallel with tar via bash-ll causes nodedown Initial Comment: On a RH9 system with OpenSSI 1.0.0, running "ps -- node <nodenum>" in parallel while multiple processes started via bash-ll are in execution causes <nodenum> to be shutdown. Strangely, doing a "ping <nodenum>" shows that <nodenum> is still reachable via network. However, "cluster" command notices that the additional node is down. node 2 transitions to DOWN state and the node down event is handled by all surviving nodes! The following can reliably reproduce the problem: On a 2-node OpenSSI cluster, assuming the node numbers are 1 and 2, run the following on the init node (node 1): $ /tmp/trytar.sh & $ strace -f -ff -o ps1 ps --node 2 Where /tmp/trytar.sh contains the following: ----- BEGIN: /tmp/trytar.sh ----- for i in 0 1 2 3 4 5 6 7 8 9 do strace -f -ff -o tar$$ bash-ll -c "tar cf tar$$.out /tmp/*.out" > /dev/null 2>&1 & sleep 2; done ----- END: /tmp/trytar.sh ----- ---------------------------------------------------------------------- Comment By: Kishore Sampathkumar (kishoreks) Date: 2004-09-08 04:20 Message: Logged In: YES user_id=857156 The check-in that Laura had done on the OPENSSI-RH-1-0- STABLE in kernel/cluster/ssi/vproc/procfs_subr.c corresponding to a fix for some other problem actually now fixes this problem as well. I checked-out the above file from OPENSSI-RH-1-0-STABLE branch, built a new kernel, and after booting on that, tried the above mentioned test. The test now succeeds. Closing this bug. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1019663&group_id=32541 |