From: Roy S. <ro...@st...> - 2008-04-12 16:39:46
|
I'm taking another look at making our assert()/error()/abort()/throw() behavior more consistent and more useful, but with the levels of code below us it's not easy. If one process fails an assert(), for instance, it eventually dies with an abort() and returns an error code to the OS, but for some reason the other processes give errors like this to stderr: p1_22444: p4_error: interrupt SIGx: 6 rm_l_1_22503: (180.959010) net_send: could not write to fd=5, errno = 32 And non-errors like this to the shell: prandtl (82)$ echo $? 0 Any idea where those p4_error and net_send messages are coming from and how I can intercept them? (Signal handlers?) In addition to giving accurate return codes to invokers such as make, I'd like to make stack trace dumps and core dumps (from all processors) into an optional feature. --- Roy |
From: John P. <jwp...@gm...> - 2008-04-12 16:58:26
|
This might be a good place to start? It describes the MPI_Errhandler_create and MPI_Errhandler_set interfaces. -J http://www.mpi-forum.org/docs/mpi-11-html/node148.html On Sat, Apr 12, 2008 at 11:39 AM, Roy Stogner <ro...@st...> wrote: > > I'm taking another look at making our assert()/error()/abort()/throw() > behavior more consistent and more useful, but with the levels of code > below us it's not easy. If one process fails an assert(), for > instance, it eventually dies with an abort() and returns an error code > to the OS, but for some reason the other processes give errors like > this to stderr: > > p1_22444: p4_error: interrupt SIGx: 6 > rm_l_1_22503: (180.959010) net_send: could not write to fd=5, errno = 32 > > And non-errors like this to the shell: > prandtl (82)$ echo $? > 0 > > Any idea where those p4_error and net_send messages are coming from > and how I can intercept them? (Signal handlers?) In addition to > giving accurate return codes to invokers such as make, I'd like to > make stack trace dumps and core dumps (from all processors) into an > optional feature. > --- > Roy > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Libmesh-devel mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-devel > |
From: Roy S. <ro...@st...> - 2008-04-12 17:55:43
|
On Sat, 12 Apr 2008, John Peterson wrote: > This might be a good place to start? It describes the > MPI_Errhandler_create and MPI_Errhandler_set interfaces. Thanks, but according to this, MPI's default behavior should already be what I want: to do an MPI_Abort on any errors. Of course, even when I do an MPI_Abort(COMM_WORLD, myerrorcode) manually, "myerrorcode" only gets printed to the screen, and the code "1" gets returned to the OS instead. So perhaps my problem isn't with the MPI spec but with the particular MPICH 1.2.7 implementation... --- Roy |
From: Benjamin K. <ben...@na...> - 2008-04-12 19:44:44
|
>> This might be a good place to start? It describes the >> MPI_Errhandler_create and MPI_Errhandler_set interfaces. > > Thanks, but according to this, MPI's default behavior should already > be what I want: to do an MPI_Abort on any errors. > > Of course, even when I do an MPI_Abort(COMM_WORLD, myerrorcode) > manually, "myerrorcode" only gets printed to the screen, and the code > "1" gets returned to the OS instead. So perhaps my problem isn't with > the MPI spec but with the particular MPICH 1.2.7 implementation... For what it's worth, I've found both MPICH2 and OpenMPI to be much better in terms of properly killing tasks and generally handling aborts better. Of course, they are both MPI-2 implementations, and in general I don't want to require MPI2 for library support, but it may be worth using it during development nonetheless. -Ben |