On Wed, Aug 21, 2013 at 2:25 PM, Roy Stogner <roystgnr@ices.utexas.edu> wrote:

On Wed, 21 Aug 2013, John Peterson wrote:

On Wed, Aug 21, 2013 at 12:31 PM, Roy Stogner <roystgnr@ices.utexas.edu> wrote:
> To fix the "no MPI Abort when the stack isn't unwound" case: Check
> MPI_Initialized() in our terminate handler, call MPI_Abort() from
> there if it's true?

Looks like LibMeshInit's destructor currently only checks #if defined(LIBMESH_HAVE_MPI), but I agree that checking MPI_Initialized() would be more thorough.

No, we also test "libmesh_initialized_mpi" - the theory there is that
if MPI was already initialized before the LibMeshInit was created then
whoever started it should be the one to handle ending it too.

Which is a pretty solid theory on lines 602--603.  I could imagine a
more "every man for himself" policy on lines 564--572.

> To fix the "no stack trace when the stack is unwound" case: move the
> print_trace back from our terminate handler to the libmesh_error()

The libmesh_error() macro already calls libMesh::print_trace() on one processor jobs.  It doesn't do print anything for parallel jobs, but we could make it
do so, this was my idea in the patch I sent you off-list a few days back (attached).  So I'd remove libmesh_write_traceout() from our error handler as well
if we apply this patch...

Yeah, otherwise the second libmesh_write_traceout() would potentially
just overwrite the output of the first, with a much-less-useful
unwound stack.

> other thrown exceptions.  Perhaps we could somehow keep our terminate
> handler printing traces in cases where the uncaught exception isn't
> from one of our macros?

You mean exceptions thrown from standard library routines?  

Or from third party libraries, yes.

If std::uncaught_exception() == true, can you get access to the
uncaught exception in any way?
 If yes I guess you could try dynamic_cast'ing it to one of ours...

Looks like std::exception_ptr and std::current_exception() are
C++11-specific, and even in that case the mechanism for getting at the
exception (rethrow_exception wrapped in a try/catch) is uglier than a
dynamic cast.

I think I'd rather make libmesh_write_traceout() append instead of
overwrite, stick some separator line at the beginning of each
traceout, and accept that lots of people are going to get redundancy
in their error reporting.

OK, I will work on some patches that do all this, test them, and share a branch for others to try.