From: Greg W. <gw...@la...> - 2005-09-27 15:50:49
|
Are you using bjs to allocate the nodes? The default allocation time is 1 second I think. Greg On Sep 27, 2005, at 8:34 AM, Andrew Pitre wrote: > I'm having trouble getting mpi programs to execute for >= 1 sec. I > have a simple program that loops, prints the execution time then > quits. > > When the loop count is increased to where the execution time is > greater than or about 1 sec, the program fails with the following > messages: > "mpirun: error: child process (rank=0; node=0) exited abnormally. > mpirun: error: aborting." > > Replacing the loop with a sleep() statement has a similar effect, > processes can sleep for any amount of time < 1 sec, e.g. sleep(. > 999999) is ok, but if sleep(1) is called the program fails with the > above error. > > I've tried adjusting the pingtimeout with settings 30, 3000, and > 30000, without success. > > The environment is Clustermatic 5 with a custom compiled 2.6.9 > kernel and bproc4.0.0pre8 module on Opteron processors. This > problem does not appear on a LAM based non-bproc cluster with the > same source code. > > Any help with this will be greatly appreciated. > > - Andrew > > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App > Server.Download it for free - -and be entered to win a 42" plasma > tv or your very > own Sony(tm)PSP. Click here to play: http://sourceforge.net/ > geronimo.php > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |