Re: [Plplot-devel] Third proof of concept for the "exception handling" topic -- Further discussion

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Phil:

On 2017-07-23 22:00+0100 Phil Rosenberg wrote:

> On 23 July 2017 at 12:27, Alan W. Irwin <ir...@be...> wrote:
>> I suggested in my big e-mail to you early this week (Mon, 17 Jul 2017
>> 14:59:00 -0700 (PDT)) concerning your second iteration a method of doing
>> this using a list of plstream heap pointers.  Please respond to that
>> suggestion.
>
> When you said a list I'm not sure I understood. Do you mean a linked
> list rather than an array?

No, I was using list in a generic sense so it you store that
information (and also access it) in an array form that would be fine
with me. So basically the idea is whatever you are doing now in
plfreeall for handling plstream heap pointers correctly should also be
done in plmalloc, plcalloc, plrealloc, and plfree so we can 
replace all calls of malloc, calloc, realloc, and free in our
code with calls to the pl* variants of those.

>
>>
>> Also please respond to my suggestion (with appropriate code changes
>> while I handle the CMake part) in that e-mail concerning
>>
>> #define PL_IMPLEMENT_EXCEPTION_HANDLING and
>> #ifdef PL_IMPLEMENT_EXCEPTION_HANDLING
>>

Please go back to look at that original e-mail for this sub-topic, but
to summarize what I meant is please implement this preprocessor
instruction idea for iteration 4, and I will
subsequently implement a CMake option to control whether
PL_IMPLEMENT_EXCEPTION_HANDLING is #defined or not.

>> On 2017-07-22 22:47+0100 Phil Rosenberg wrote:
>>
>>> [in that same e-mail] You asked why every API call required a PLTRY block.
>>> You are correct
>>> that actually they don't. Every API call that may result in a call to
>>> plexit needs a PLTRY block.
>>
>>
>> I initially didn't understand this explanation, but after _much_ too
>> long a time considering various scenarios to explain my point of view
>> the light finally dawned. Exception handling does not allow you to
>> transfer control to arbitrary places in the code (a stupid assumption
>> I was making). Instead, it merely allows you to unwind the call graph
>> from the plexit call (in this case that is the only current use of
>> PLTHROW in our source code) to return control to a routine potentially
>> far up the call stack from there such as _any_ of the public API
>> routines (as you said) that have a possible call to plexit somewhere
>> lower down on their call graph.
>
> Yes correct, we just unwind the stack to a certain point in the call
> sequence (or whatever terminology you choose as you mentioned in your
> ps and pps).  This is good, because the user application is still sat
> waiting for our API function to return. We don't want to jump to some
> specific execution point, we need that function to return and for the
> user to find out somehow that there has been an error.

Thanks for that confirmation.  It is a big relief to me to finally move
beyond my previous misconception.

>
>>
>> So sorry for my previous noise on this sub-topic, but
>> now we are finally on the same page in this regard, I have some
>> further questions and comments about your implementation of
>> exception handling and how you are using that exception handling
>> to deal with bad errors.
>>
>> * If you miss putting a PLTRY and PLCATCH block in one of those public
>>   API cases where a potential call to plexit is somewhere lower on the
>>   call graph, then what does the current code do?  Does it simply
>>   crash (i.e., is that the potential cause of the serious issues I
>>   have reported to you in my previous post) or do you handle that
>>   internal error case by, e.g., emitting an "internal error: unhandled
>>   exception" message followed by an exit?
>
> I think that currently something very odd. The value of
> pls->currentjumpbuffer used in PLTHROW would be either out of date or
> NULL, so probably some sort of crash. This is why it is important that
> every API entry function includes a PLTRY, PLCATCH, PLENDTRY sequence.
> Of course we could set things up to check for this scenario, but then
> what would we do? Call exit()? I had been trying to think if there was
> a C method to check at compile time, but I cannot think of one (there
> would be ways in C++ I think). But the API changes so infrequently
> that I think just ensuring all API functions include the PLTRY,
> PLCATCH, PLENDTRY code is probably okay - open to suggestions though.

Assuming we develop this proof-of-concept into the real deal and merge
this topic to master (and so far I haven't seen any issues in this
proof-of-concept that would preclude reaching that eventual result),
there will be lots of functions where we need to add PLTRY and
PLCATCH blocks.

So given that situation where unhandled exceptions may be a fact of
life for us for some time (and possibly any time in the future when
new topics with e.g., additional externally accessible API are merged
to master) I think we should handle this situation with a specific
error message and calling exit rather than crashing with, e.g., a
segfault.  That conclusion is also based on my impression that is
exactly how C++ deals with the unhandled exception issue. Furthermore,
I think this is fundamentally a more satisfactory result than a crash
whenever an unhandled exception occurs (especially if we copy this
approach for our other non-plplot C libraries and our C device drivers,
see further discussion below of this possibility).

I mentioned "lots of functions" above.  Note those will not just be
limited to the official "c_*" API functions.  Instead, our
long-standing visibility work has determined all functions that are
externally accessed in our tests, and we have effectively labelled all
of those with the PLDLLIMPEXP visibility macro that is #defined in
include/pldll.h.in.  So that means identification of all the functions
that need PLTRY/PLCATCH blocks has already been done for us.  However,
there are a maximum of 326 of them spread over 6 files as revealed by

software@raven> grep -l 'PLDLLIMPEXP  *' include/*.h* |grep -v pldll
include/drivers.h
include/ltdl_win32.h
include/pdf.h
include/plplot.h
include/plplotP.h
include/plstrm.h

software@raven> grep 'PLDLLIMPEXP  *' include/*.h* |grep -v pldll |wc -l
326

(Note, I have included drivers.h and pdf.h in the above list because
in some configurations all driver code is absorbed into the core
PLplot library.)

I emphasize you don't have to be too concerned about inserting many
more PLTRY/PLCATCH blocks for your proof-of-concept. But assuming the
time comes in the relative near future when your C exception handling
proof-of-concept has matured sufficiently so you at least want to
merge it to the master branch as an option (see discussion above of
the PL_IMPLEMENT_EXCEPTION_HANDLING macro) for PLplot users, then
potentially there is a lot of editing work here with the prospect of
dealing with typographical bugs in that work.

Thus, would it be possible to deal with this situation automatically
with appropriate macro to generate the code (consisting of a PLTRY
block containing an appropriate call to the original renamed function,
and empty PLCATCH block and all visibility issues taken care of)?

However, if you don't think this automated approach is possible or it
is too much work or too error prone, then you can count on me to help
you with the required manual editing work for those 300+ functions
when the time comes.

>> * I also suggest you should implement a PLplot library state variable
>>   (let's call it plplot_library_status) that is initialized to 0 (good
>>   status) when the library is initialized but set to 1 (bad status) by
>>   plexit. Then follow up by implementing a c_plstatus() public API
>>   that returns the value of library_status and which emits a message
>>   such as ("library had bad error so it had to be shut down.  To use
>>   it again you need to initialize the library by calling one of the
>>   plinit family of routines".) Then an external application could call
>>   c_plstatus to check on the status of the library after any call to
>>   our public API and act accordingly.
>>
>>   Or it could be sloppy about that (i.e., our standard C examples
>>   might only call plstatus in one example to make sure it works). In
>>   which case after the library is shutdown (see above) by an internal
>>   call to plexit there would be a blizzard of further plabort (due to
>>   plsc->level being 0) and plexit messages after the first one because
>>   of that library shutdown.  But there would be no call to the dreaded
>>   exit routine so there is at least the chance that the external
>>   application will survive that sloppiness.  And if plexit does not
>>   call plend, I think the chances of an external application
>>   encountering a showstopper error are much worse.
>
> Hmm, this is quite interesting actually. As I said above I would like
> to not totally kill the PLStream - a good example could be that maybe
> plexit gets called due to a bad path for a shapefile for a map. This
> maybe could be fixed if the calling program checks for an error state.
> Of course people are lazy :-) so it might not get checked. In my
> initial thinking, the map plot would fail but all other plotting would
> carry on. maybe it would be better to set the PLStream to an error
> state and have all further API calls fail. Maybe the user could clear
> the error state to permit plotting to continue. I welcome thoughts on
> this.
>
> In terms of an error flag that you suggested. I am thinking perhaps
> all calls to plerr result in an error message being logged in the
> PLStream, plus when we reach the top level PLTRY/PLCATCH/PLENDTRY we
> call a user provided error callback. Then we have an API function to
> get the list of errors, another to clear the list of errors and one to
> check if there are any errors (which would just just check if the
> number of error messages in the log is greater than 0). So it is like
> your flag, but with some extra detail.

I have now thought a bit more about this issue with propagating
library error conditions to other C libraries and executables.
Because of the #includes that are normally used by all such libraries
that call the core PLplot library, these "other libraries and
executables" have access to the same PLTRY, PLCATCH, and PLTHROW
macros that the core PLplot library does.  So I am thinking along
the lines of all instances of PLplot calls occurring in our
standard examples, e.g.,

pladv(0);

being replaced by

pladv(0);
if(plstatus())
   PLTHROW(NULL);

The point is, this is a bit cleaner than

pladv(0);
if(plstatus())
{
   fprintf(stderr "There was an error in one of the PLplot library calls\n");
   exit(1);
}

So I hope that first form using PLTHROW is possible in the standard
examples.

In any case, no matter how you decide to propagate errors across C
library boundaries, I think our ultimate goal should be to use
exception handling (with at least a PLTRY and PLCATCH block in each of
all our C libraries and standard C executables and interface our C
exception handling with native language exception handling or C++
device drivers, bindings, and standard examples, and for bindings and
examples for all non-C++ languages (e.g., Ada, D, Python, OCaml) which
support native exception handling.  I do realize I am speculating
quite a bit here, but I hope you take it in the intended spirit which
is to generate discussion about the absolute best way to propagate library
errors across library boundaries considering these future needs.

To summarize, I think to help finish this proof-of-concept you need to

* fix the bugs I found for iteration 3,

* implement the "exit" idea for the case of unhandled exceptions to
   avoid crashing in that case,

* implement the idea concerning the PL_IMPLEMENT_EXCEPTION_HANDLING macro,

* finalize how you are going to propagate errors across library
   boundaries, and

* demonstrate a completely unlazy version of one of the C examples
   which itself has a PLTRY block and PLCATCH block in its main routine
   and PLTHROWs associated with each PLplot call executed if plstatus()
   (or whatever you decide to replace it with) indicates a library error
   condition.

And I need to

* Test all your remaining iterations until this proof of concept is
   finished including devising some additional tests with a local patch
   (obviously not to be committed) to generate lots of calls to plexit through many different potential
   call stacks to test propagation of errors for most PLplot commands
   of most examples including the non-lazy one and all the lazy ones.

* Implement a commit that removes trailing blanks if you don't do
   that yourself.

* Implement a commit that adds a CMake option to control whether
   the PL_IMPLEMENT_EXCEPTION_HANDLING C macro is #defined or not.

Once all tests are passed with the finished proof-of-concept then we
can safely merge this to master (with PL_IMPLEMENT_EXCEPTION_HANDLING
defaulting to undefined), and then afterward expand to full PLTRY
coverage (automatically or manually implemented) for all externally
visible PLplot functions.  And at some point introduce completely
consistent error handling for all our memory allocation calls
whether they correspond to a plstream pointer or not.

Once we are completely happy with the exception handling in libplplot
and our C examples, we should turn the default option ON so
PL_IMPLEMENT_EXCEPTION_HANDLING is #defined by default for most of our
users, and as time permits extend exception handling to our other C
libraries, our C and C++ device drivers and our bindings and examples
for all supported languages that implement their own native exception
handling.

So it is going to be quite a lot of work to get from here to there.
But this is a lofty goal (to eliminate exit completely from our entire
code base).  And I think the results as we gradually eliminate the
exit call from our entire code base (except for internal errors due to
exceptions which are not handled by accident). will be well worth it.
Also, as we phase in each component of this work, that phase is not
and all or nothing proposition since each component can be protected
by its own non-default CMake option (just like I am proposing with the
PL_IMPLEMENT_EXCEPTION_HANDLING macro) for the proof-of-concept once
it is finished.  So the work can be considerably spread out over a
long time and remain on our master branch during all of that time.

> Thanks for the thoughts and comments so far. They are very welcome.

And the same goes for your thoughts and comments as well!

Furthermore, since the lack of error handling has long been one of the
most important fundamental PLplot library issues, it is great to see
all your C exception-handler ideas fleshed out and implemented in your
current proof-of-concept rather than us having a rather speculative
discussion with absolutely no proof-of-concept code to help keep the
discussion focussed.

Best wishes,

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

Re: [Plplot-devel] Third proof of concept for the "exception handling" topic -- Further discussion

Cross-platform, scientific graphics plotting library

Re: [Plplot-devel] Third proof of concept for the "exception handling" topic -- Further discussion