|
From: Andrew P. <ap...@ro...> - 2005-09-27 14:35:15
|
I'm having trouble getting mpi programs to execute for >= 1 sec. I have a simple program that loops, prints the execution time then quits. When the loop count is increased to where the execution time is greater than or about 1 sec, the program fails with the following messages: "mpirun: error: child process (rank=0; node=0) exited abnormally. mpirun: error: aborting." Replacing the loop with a sleep() statement has a similar effect, processes can sleep for any amount of time < 1 sec, e.g. sleep(. 999999) is ok, but if sleep(1) is called the program fails with the above error. I've tried adjusting the pingtimeout with settings 30, 3000, and 30000, without success. The environment is Clustermatic 5 with a custom compiled 2.6.9 kernel and bproc4.0.0pre8 module on Opteron processors. This problem does not appear on a LAM based non-bproc cluster with the same source code. Any help with this will be greatly appreciated. - Andrew |