On Thu, 28 Feb 2008, Benjamin Kirk wrote:
>> Seems like a paradox: we can't call MPI_Abort from within error(),
>> because error() can't be sure there isn't some enclosing code waiting
>> to catch its exception, but we do have to be able to call MPI_Abort
>> when error() is called if there is no catch waiting.
> I like what you propose.
I did too, until I actually tested it out. :-(
Apparently the C++ standard is as follows:
| 15.3 - Handling an exception [except.handle]
| -9- If no matching handler is found in a program, the function terminate() is called;
In other words, unless there's actually a try block waiting to catch
any error()-thrown exceptions, C++ doesn't bother with all that stack
unwinding, destructor calling rigamarole, it just kills the whole
program. That's probably an efficient way to do things with one
process. Why bother freeing memory bit by bit when the operating
system is just going to recover it en masse anyway? But, this seems
pretty stupid when the resources your destructors are responsible for
freeing might include network requests, not just local resources. In
our case, if the LibMeshInit object never gets destroyed, it never
gets the chance to call MPI_Abort and alert the rest of the network.
So, our options here:
Use std::set_terminate(). We could register our own replacement
terminate() function, which would call MPI_Abort. This would make
sure that other MPI processes are never left hanging, but it seems
like a rude thing to do from a library - what if user code already
changed the terminate() function? I don't see any easy way to figure
out what it was changed to and respect that.
Revert everything. I'd rather not do this. Even if the
libMesh::init() -> LibMeshInit change was a little awkward, we're
still supporting the old initialization function, and I think the
ability for new programs to catch error()-thrown exceptions is worth
the annoyance of deprecated() messages from old code.
Do nothing. My MPI library is pretty good about figuring out that
when one process dies, the rest can't network write to it anymore and
should exit. I'll bet other MPI libraries are just as good. This is
basically what happens when there's a segfault, after all. The "do
nothing" plan also appeals to my sense of laziness, so it's what I'll
do (or not do?) unless anyone objects.