From: Alan W. I. <ir...@be...> - 2017-07-15 20:50:07
|
On 2017-07-15 11:49+0200 Ole Streicher wrote: > Another (probable longer-standing) topic would be the use of exit() in > several libraries in case of error. That usually has the disadvantage > that it makes debugging more complicated, since you don't get a > stacktrace; abort() may be the better solution here. Any thoughts about > that? To Phil and Ole (with CC to Hazen and Andrew): @Phil: There is a key question for you at the end. @Ole: Here is some background on the issue you brought up above. An even more important issue with exit that concerns PLplot developers is, for example, some poor guy has been working on some interactive session for 4 hours, (e.g., octave) and then attempts to finish up by plotting his results, but he makes a wrong PLplot call ==> exit, loses all those 4 hours of work! Not pretty! The obvious solution is to implement a proper error handling system for the PLplot libraries, bindings libraries, and device drivers that avoids the use of exit and simply propagates an error condition to whatever library or user routine calls PLplot routines if something goes wrong anywhere in any of the PLplot-related source code. We have discussed here fairly recently the best way to implement such an error handling system. There have been two schools of thought about how we implement that which I will attempt to summarize here. 1. Implement our error handling using normal C methods. That is, change our API to return an error code for each PLplot routine that can possibly have an error condition or which calls a PLplot routine that itself could have an error condition. (This turns out to be most PLplot-related routines). Then those error conditions codes must be checked and propagated further if necessary after virtually all returns from PLplot routines that occur within the PLplot libraries, bindings libraries, and device drivers. 2. Implement our error handling system using longjmp and setjmp See <http://www.di.unipi.it/~nids/docs/longjump_try_trow_catch.html> for an example of how to use those two C library routines (both available since c89 and also supported by Windows) to implement exception handling in C that is quite similar to exception handling in C++. I believe when Phil Rosenberg suggested a longjmp/setjmp based error handling system he was unaware of this example so his proposal likely varied in many details from this example. But the overall principle was the same which is you can use these two functions to avoid the laborious process of having to propagate error codes via function return values. Andrew Ross and Hazen Babcock liked the first idea because it is the standard C way to implement error handling. Phil and I liked the second idea because it is a lot less work (the PLplot call graphs are extraordinarily complex so there are potentially huge numbers of Plplot routine return values to check within the PLplot libraries, bindings libraries, and device drivers). The above reference on C exception handling mentioned an issue many C programmers have with longjmp/setjmp which is they are completely unfamiliar with their use. And I think PLplot developers (including me) do indeed have such unfamiliarity concerns. So to help us get more familiar with their use, I asked Phil the last time we discussed this to implement a simple proof of concept of his idea (e.g., a modification of plexit that unwinds the stack and delivers an error message to a single convenient place in our PLplot core library (likely our library initialization) where setjmp is called to set everything up.) I suspect Phil just plain forgot about that request due to other time pressures on him so I am taking this opportunity to make that request again. :-) @Phil: I am still just as unfamiliar (except for what I read in man pages) as the rest of our developers with longjmp/setjmp. So are you game to implement such a simple proof-of-concept to help us become more familiar with those two routines? If so, how about dealing with it right away (after all, I think we are only dealing with something like ~20 lines of code) to avoid losing it again in your stack of PLplot e-mail? This variant of plexit should only be compiled if the user specifies the -DHAVE_ERROR_HANDLING=ON experimental CMake option which should default to NO to provide sufficient safety that if you act fast you can safely merge this with the master branch in time for the 5.13.0 release. The goal of such a simple proof of concept would merely be to confirm the stack was properly unwound by longjmp with a message right after the single place in the PLplot library that you call setjmp during library initialization. If that call returns 0 (i.e., it is the initial call made during library initialization) then no message would be generated, but it it returns non-zero (i.e, the result of longjmp) then generate a "this proof-of-concept is working" message before (for now) exiting just like ordinary plexit does. So this simple proof of concept would be restricted to inside the PLplot core library, and the gory details of how to propagate error reporting across the boundaries of our PLplot libraries, bindings library, and device drivers would be ignored for now. Of course, this proof of concept would be far from a full solution for our much needed error handling system, but it could be a very important first step in achieving that important PLplot goal. So how about taking that first step now? Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: <p.d...@gm...> - 2017-07-15 23:24:21
|
Hi Alan Proof of concept was a little more than 20 or so lines 😉 I'll pull out the code I started and check where is stands. Phil Sent from my Windows 10 phone From: Alan W. Irwin Sent: 15 July 2017 21:50 To: Phil Rosenberg; Ole Streicher Cc: Hazen Babcock; and...@us...; PLplot development list Subject: First proof-of-concept step requested to implement PLplot errorhandling On 2017-07-15 11:49+0200 Ole Streicher wrote: > Another (probable longer-standing) topic would be the use of exit() in > several libraries in case of error. That usually has the disadvantage > that it makes debugging more complicated, since you don't get a > stacktrace; abort() may be the better solution here. Any thoughts about > that? To Phil and Ole (with CC to Hazen and Andrew): @Phil: There is a key question for you at the end. @Ole: Here is some background on the issue you brought up above. An even more important issue with exit that concerns PLplot developers is, for example, some poor guy has been working on some interactive session for 4 hours, (e.g., octave) and then attempts to finish up by plotting his results, but he makes a wrong PLplot call ==> exit, loses all those 4 hours of work! Not pretty! The obvious solution is to implement a proper error handling system for the PLplot libraries, bindings libraries, and device drivers that avoids the use of exit and simply propagates an error condition to whatever library or user routine calls PLplot routines if something goes wrong anywhere in any of the PLplot-related source code. We have discussed here fairly recently the best way to implement such an error handling system. There have been two schools of thought about how we implement that which I will attempt to summarize here. 1. Implement our error handling using normal C methods. That is, change our API to return an error code for each PLplot routine that can possibly have an error condition or which calls a PLplot routine that itself could have an error condition. (This turns out to be most PLplot-related routines). Then those error conditions codes must be checked and propagated further if necessary after virtually all returns from PLplot routines that occur within the PLplot libraries, bindings libraries, and device drivers. 2. Implement our error handling system using longjmp and setjmp See <http://www.di.unipi.it/~nids/docs/longjump_try_trow_catch.html> for an example of how to use those two C library routines (both available since c89 and also supported by Windows) to implement exception handling in C that is quite similar to exception handling in C++. I believe when Phil Rosenberg suggested a longjmp/setjmp based error handling system he was unaware of this example so his proposal likely varied in many details from this example. But the overall principle was the same which is you can use these two functions to avoid the laborious process of having to propagate error codes via function return values. Andrew Ross and Hazen Babcock liked the first idea because it is the standard C way to implement error handling. Phil and I liked the second idea because it is a lot less work (the PLplot call graphs are extraordinarily complex so there are potentially huge numbers of Plplot routine return values to check within the PLplot libraries, bindings libraries, and device drivers). The above reference on C exception handling mentioned an issue many C programmers have with longjmp/setjmp which is they are completely unfamiliar with their use. And I think PLplot developers (including me) do indeed have such unfamiliarity concerns. So to help us get more familiar with their use, I asked Phil the last time we discussed this to implement a simple proof of concept of his idea (e.g., a modification of plexit that unwinds the stack and delivers an error message to a single convenient place in our PLplot core library (likely our library initialization) where setjmp is called to set everything up.) I suspect Phil just plain forgot about that request due to other time pressures on him so I am taking this opportunity to make that request again. :-) @Phil: I am still just as unfamiliar (except for what I read in man pages) as the rest of our developers with longjmp/setjmp. So are you game to implement such a simple proof-of-concept to help us become more familiar with those two routines? If so, how about dealing with it right away (after all, I think we are only dealing with something like ~20 lines of code) to avoid losing it again in your stack of PLplot e-mail? This variant of plexit should only be compiled if the user specifies the -DHAVE_ERROR_HANDLING=ON experimental CMake option which should default to NO to provide sufficient safety that if you act fast you can safely merge this with the master branch in time for the 5.13.0 release. The goal of such a simple proof of concept would merely be to confirm the stack was properly unwound by longjmp with a message right after the single place in the PLplot library that you call setjmp during library initialization. If that call returns 0 (i.e., it is the initial call made during library initialization) then no message would be generated, but it it returns non-zero (i.e, the result of longjmp) then generate a "this proof-of-concept is working" message before (for now) exiting just like ordinary plexit does. So this simple proof of concept would be restricted to inside the PLplot core library, and the gory details of how to propagate error reporting across the boundaries of our PLplot libraries, bindings library, and device drivers would be ignored for now. Of course, this proof of concept would be far from a full solution for our much needed error handling system, but it could be a very important first step in achieving that important PLplot goal. So how about taking that first step now? Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2017-07-16 10:35:20
|
On 2017-07-16 00:24+0100 p.d...@gm... wrote: > Hi Alan > Proof of concept was a little more than 20 or so lines 😉 > > I'll pull out the code I started and check where is stands. Sounds good. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Phil R. <p.d...@gm...> - 2017-07-16 10:43:37
|
Hi Alan et al Please find attached some patches that show some bare bones start to using setjmp and longjmp exception style error handling in plplot. Please have a look at the log for a detailed description of what is going on and how it works. What this is: A first start with enough code to show the basic principles of how this could work. What this isn't: A fully working version that should go anywhere near the actual repo. I have only tested to see if it compiles, builds and that x00c runs. In fact I get some tcl related exception on exit, which I'm not sure if I just caused or if it was there before. I am sending out this super early untested code that may still have major bugs and is based on whatever (quite old) plplot source I happened to have on my machine so that you can see the PRINCIPLES of how it works. And because I have other stuff on today so I might not get chance to look at this for a while and I know Alan was keen to see something asap. So feel free to be critical, but please not too critical :-) Alan, have a look, have a play. Thoughts welcome. Phil On 16 July 2017 at 00:24, <p.d...@gm...> wrote: > Hi Alan > > Proof of concept was a little more than 20 or so lines 😉 > > > > I'll pull out the code I started and check where is stands. > > > > Phil > > > > Sent from my Windows 10 phone > > > > *From: *Alan W. Irwin <ir...@be...> > *Sent: *15 July 2017 21:50 > *To: *Phil Rosenberg <p.d...@gm...>; Ole Streicher > <deb...@li...> > *Cc: *Hazen Babcock <hba...@ma...>; and...@us...; PLplot > development list <plp...@li...> > *Subject: *First proof-of-concept step requested to implement PLplot > errorhandling > > > > On 2017-07-15 11:49+0200 Ole Streicher wrote: > > > > > Another (probable longer-standing) topic would be the use of exit() in > > > several libraries in case of error. That usually has the disadvantage > > > that it makes debugging more complicated, since you don't get a > > > stacktrace; abort() may be the better solution here. Any thoughts about > > > that? > > > > To Phil and Ole (with CC to Hazen and Andrew): > > > > @Phil: > > There is a key question for you at the end. > > > > @Ole: > > > > Here is some background on the issue you brought up above. > > > > An even more important issue with exit that concerns PLplot developers > > is, for example, some poor guy has been working on some interactive > > session for 4 hours, (e.g., octave) and then attempts to finish up by > > plotting his results, but he makes a wrong PLplot call ==> exit, loses > > all those 4 hours of work! Not pretty! > > > > The obvious solution is to implement a proper error handling system > > for the PLplot libraries, bindings libraries, and device drivers that > > avoids the use of exit and simply propagates an error condition to > > whatever library or user routine calls PLplot routines if something > > goes wrong anywhere in any of the PLplot-related source code. > > > > We have discussed here fairly recently the best way to implement such > > an error handling system. There have been two schools of thought about > > how we implement that which I will attempt to summarize here. > > > > 1. Implement our error handling using normal C methods. That is, > > change our API to return an error code for each PLplot routine that > > can possibly have an error condition or which calls a PLplot routine > > that itself could have an error condition. (This turns out to be most > > PLplot-related routines). Then those error conditions codes must be > > checked and propagated further if necessary after virtually all > > returns from PLplot routines that occur within the PLplot libraries, > > bindings libraries, and device drivers. > > > > 2. Implement our error handling system using longjmp and setjmp > > > > See <http://www.di.unipi.it/~nids/docs/longjump_try_trow_catch.html> > > for an example of how to use those two C library routines (both > > available since c89 and also supported by Windows) to implement > > exception handling in C that is quite similar to exception handling in > > C++. I believe when Phil Rosenberg suggested a longjmp/setjmp based > > error handling system he was unaware of this example so his proposal > > likely varied in many details from this example. But the overall > > principle was the same which is you can use these two functions to > > avoid the laborious process of having to propagate error codes via > > function return values. > > > > Andrew Ross and Hazen Babcock liked the first idea because it is the > > standard C way to implement error handling. Phil and I liked the > > second idea because it is a lot less work (the PLplot call graphs are > > extraordinarily complex so there are potentially huge numbers of > > Plplot routine return values to check within the PLplot libraries, > > bindings libraries, and device drivers). > > > > The above reference on C exception handling mentioned an issue many C > > programmers have with longjmp/setjmp which is they are completely > > unfamiliar with their use. And I think PLplot developers (including > > me) do indeed have such unfamiliarity concerns. So to help us get > > more familiar with their use, I asked Phil the last time we discussed > > this to implement a simple proof of concept of his idea (e.g., a > > modification of plexit that unwinds the stack and delivers an error > > message to a single convenient place in our PLplot core library > > (likely our library initialization) where setjmp is called to set > > everything up.) I suspect Phil just plain forgot about that request > > due to other time pressures on him so I am taking this opportunity to > > make that request again. :-) > > > > @Phil: > > > > I am still just as unfamiliar (except for what I read in man pages) as > > the rest of our developers with longjmp/setjmp. So are you game to > > implement such a simple proof-of-concept to help us become more > > familiar with those two routines? > > > > If so, how about dealing with it right away (after all, I think we are > > only dealing with something like ~20 lines of code) to avoid losing it > > again in your stack of PLplot e-mail? This variant of plexit should > > only be compiled if the user specifies the -DHAVE_ERROR_HANDLING=ON > > experimental CMake option which should default to NO to provide > > sufficient safety that if you act fast you can safely merge this with > > the master branch in time for the 5.13.0 release. > > > > The goal of such a simple proof of concept would merely be to confirm > > the stack was properly unwound by longjmp with a message right after > > the single place in the PLplot library that you call setjmp during > > library initialization. If that call returns 0 (i.e., it is the > > initial call made during library initialization) then no message would > > be generated, but it it returns non-zero (i.e, the result of longjmp) > > then generate a "this proof-of-concept is working" message before (for > > now) exiting just like ordinary plexit does. So this simple proof of > > concept would be restricted to inside the PLplot core library, and the > > gory details of how to propagate error reporting across the boundaries > > of our PLplot libraries, bindings library, and device drivers would be > > ignored for now. Of course, this proof of concept would be far from a > > full solution for our much needed error handling system, but it could > > be a very important first step in achieving that important PLplot > > goal. So how about taking that first step now? > > > > Alan > > __________________________ > > Alan W. Irwin > > > > Astronomical research affiliation with Department of Physics and Astronomy, > > University of Victoria (astrowww.phys.uvic.ca). > > > > Programming affiliations with the FreeEOS equation-of-state > > implementation for stellar interiors (freeeos.sf.net); the Time > > Ephemerides project (timeephem.sf.net); PLplot scientific plotting > > software package (plplot.sf.net); the libLASi project > > (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); > > and the Linux Brochure Project (lbproject.sf.net). > > __________________________ > > > > Linux-powered Science > > __________________________ > > > |
From: Alan W. I. <ir...@be...> - 2017-07-16 23:30:24
|
On 2017-07-16 11:43+0100 Phil Rosenberg wrote: > Hi Alan et al > > Please find attached some patches that show some bare bones start to using > setjmp and longjmp exception style error handling in plplot. Please have a > look at the log for a detailed description of what is going on and how it > works. > > What this is: A first start with enough code to show the basic principles > of how this could work. > > What this isn't: A fully working version that should go anywhere near the > actual repo. I have only tested to see if it compiles, builds and that x00c > runs. In fact I get some tcl related exception on exit, which I'm not sure > if I just caused or if it was there before. > > I am sending out this super early untested code that may still have major > bugs and is based on whatever (quite old) plplot source I happened to have > on my machine so that you can see the PRINCIPLES of how it works. And > because I have other stuff on today so I might not get chance to look at > this for a while and I know Alan was keen to see something asap. So feel > free to be critical, but please not too critical :-) > > Alan, have a look, have a play. Thoughts welcome. Hi Phil: Thanks for responding so far beyond my request. I used "git am" apply your first two commits (which should be handled as a separate topic) cleanly to a private topic branch I created here called "memory_allocation". That clean result was based on the tip of the current master branch. I similarly applied your four commits cleanly to a private topic branch which I call "exception_handling". i.e., I assumed for this topic branch the memory_allocation topic would be merged before the exception handling part of your commits, although I doubt that would be actually necessary, i.e., both these topics are completely independent of each other. I extensively tested your "memory_allocation" topic as follows. I configured cmake with the option -DVALGRIND_ALL_TESTS=ON and built the test_c_psc target (which builds all our c standard examples for the psc device and runs them all using valgrind). Those valgrind reports were perfect, i.e., the list of such reports without the phrases "no leaks are possible" and "0 errors from 0 contexts") is empty as can be seen from software@raven> ls examples/*.log examples/valgrind.x00c.psc.log examples/valgrind.x11c.psc.log examples/valgrind.x22c.psc.log examples/valgrind.x01c.psc.log examples/valgrind.x12c.psc.log examples/valgrind.x23c.psc.log examples/valgrind.x02c.psc.log examples/valgrind.x13c.psc.log examples/valgrind.x24c.psc.log examples/valgrind.x03c.psc.log examples/valgrind.x14c.psc.log examples/valgrind.x25c.psc.log examples/valgrind.x04c.psc.log examples/valgrind.x15c.psc.log examples/valgrind.x26c.psc.log examples/valgrind.x05c.psc.log examples/valgrind.x16c.psc.log examples/valgrind.x27c.psc.log examples/valgrind.x06c.psc.log examples/valgrind.x17c.psc.log examples/valgrind.x28c.psc.log examples/valgrind.x07c.psc.log examples/valgrind.x18c.psc.log examples/valgrind.x29c.psc.log examples/valgrind.x08c.psc.log examples/valgrind.x19c.psc.log examples/valgrind.x30c.psc.log examples/valgrind.x09c.psc.log examples/valgrind.x20c.psc.log examples/valgrind.x31c.psc.log examples/valgrind.x10c.psc.log examples/valgrind.x21c.psc.log examples/valgrind.x33c.psc.log and the empty results from software@raven> grep -L "no leaks are possible" examples/valgrind.x??c.psc.log software@raven> grep -L "0 errors from 0 contexts" examples/valgrind.x??c.psc.log Furthermore, I like what you have done with plmalloc, plrealloc, and plfree where if inside those routines malloc, realloc, or free respectively fail, plexit is called with appropriate error message. Assuming, then you want to replace all our current direct calls to malloc, realloc and free with these pl* variants, this solves a major problem we have which was all the wide range of different treatments of how malloc, realloc and free errors were checked, i.e., in the best cases plexit is called, but more usually no error checking is done at all! So this is a most encouraging result. However, there are some issues with this topic branch that I noticed * This approach should be extended to calloc as well. * You are not finished, e.g., I notice many direct calls to malloc still within our code base. * The following two compile warnings need to be addressed: [ 18%] Building C object src/CMakeFiles/plplot.dir/plcore.c.o cd /home/software/plplot/HEAD/build_dir/src && /usr/lib/ccache/cc -DPLPLOT_HAVE_CONFIG_H -DUSINGDLL -Dplplot_EXPORTS -I/home/ software/plplot/HEAD/plplot.git/include -I/home/software/plplot/HEAD/plplot.git/lib/qsastime -I/home/software/plplot/HEAD/buil d_dir -I/home/software/plplot/HEAD/build_dir/include -O3 -fvisibility=hidden -Wuninitialized -fPIC -I/usr/include -o CMake Files/plplot.dir/plcore.c.o -c /home/software/plplot/HEAD/plplot.git/src/plcore.c /home/software/plplot/HEAD/plplot.git/src/plcore.c: In function ‘plinitialisememorylist’: /home/software/plplot/HEAD/plplot.git/src/plcore.c:117:9: warning: ignoring return value of ‘realloc’, declared with attribute warn_unused_result [-Wunused-result] realloc( pls->memorylist, n * sizeof ( void* ) ); ^ /home/software/plplot/HEAD/plplot.git/src/plcore.c: In function ‘plmalloc’: /home/software/plplot/HEAD/plplot.git/src/plcore.c:140:9: warning: ignoring return value of ‘realloc’, declared with attribute warn_unused_result [-Wunused-result] realloc( pls->memorylist, n * sizeof ( void* ) ); ^ * Once you feel you have completed development of this private topic branch, then you should test it like I did above (assuming you have good access to a Linux box or I could do that testing if you like when the time comes), and assuming all the valgrind results are still clean, then you should merge it to the master branch completely independent of what happens with your exception handling topic branch. That finishes the memory allocation topic, and I now want to move on to the exception handling topic. For that topic branch I built the x00c and ps targets OK (except for the memory allocation branchh code warnings I mentioned above), but that result segfaulted at run time, e.g., software@raven> examples/c/x00c -dev psc -o test.psc Segmentation fault Here are the valgrind results for this important memory management issue software@raven> valgrind examples/c/x00c -dev psc -o test.psc ==24594== Memcheck, a memory error detector ==24594== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==24594== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==24594== Command: examples/c/x00c -dev psc -o test.psc ==24594== ==24594== Conditional jump or move depends on uninitialised value(s) ==24594== at 0x53F1EE5: longjmp (longjmp.c:32) ==24594== by 0x4E840B0: c_plenv (in /home/software/plplot/HEAD/build_dir/src/libplplot.so.14.0.0) ==24594== by 0x400858: main (in /home/software/plplot/HEAD/build_dir/examples/c/x00c) ==24594== ==24594== Warning: client switching stacks? SP change: 0xffefffdc8 --> 0x3d8d115408920055 ==24594== to suppress, use: --max-stackframe=4435220191945818765 or greater ==24594== Use of uninitialised value of size 8 ==24594== at 0x53F1F52: __longjmp (__longjmp.S:60) ==24594== ==24594== Jump to the invalid address stated on the next line ==24594== at 0x3D8D115408920055: ??? ==24594== Address 0x3d8d115408920055 is not stack'd, malloc'd or (recently) free'd ==24594== ==24594== ==24594== Process terminating with default action of signal 11 (SIGSEGV) ==24594== Bad permissions for mapped region at address 0x3D8D115408920055 ==24594== at 0x3D8D115408920055: ??? ==24594== Use of uninitialised value of size 8 ==24594== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) ==24594== ==24594== Invalid write of size 8 ==24594== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) ==24594== Address 0x3d8d11540892004d is not stack'd, malloc'd or (recently) free'd ==24594== ==24594== ==24594== Process terminating with default action of signal 11 (SIGSEGV) ==24594== General Protection Fault ==24594== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) ==24594== ==24594== HEAP SUMMARY: ==24594== in use at exit: 76,682 bytes in 288 blocks ==24594== total heap usage: 360 allocs, 72 frees, 124,441 bytes allocated ==24594== ==24594== LEAK SUMMARY: ==24594== definitely lost: 0 bytes in 0 blocks ==24594== indirectly lost: 0 bytes in 0 blocks ==24594== possibly lost: 0 bytes in 0 blocks ==24594== still reachable: 76,682 bytes in 288 blocks ==24594== suppressed: 0 bytes in 0 blocks ==24594== Rerun with --leak-check=full to see details of leaked memory ==24594== ==24594== For counts of detected and suppressed errors, rerun with: -v ==24594== Use --track-origins=yes to see where uninitialised values come from ==24594== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0) Segmentation fault So you need to address this before we can do any extensive run-time testing of your exception handling ideas. Of course, you did warn us that this code was quite preliminary so issues like above are to be expected. Furthermore, in my view, discussion of real code developers can try for themselves always beats discussion of code that doesn't exist yet. So I very much appreciate you supplying us with this preliminary exception handling code to focus our further discussion. I do have some general points to make from just looking at the broad picture of what you are trying to accomplish here with the exception handling macros PLTRY, PLCATCH, PLENDTRY, and PLTHROW. * There is currently no ability for PLTHROW to propagate an error message. Is that a fundamental limitation of setjmp/longjmp based C exception handling or could that feature be easily implemented? If the latter, then that gives us the option to use PLTHROW wherever errors occur rather than always calling plexit which then prints the error message before it uses PLTHROW. So for example in plmalloc, plcalloc, plrealloc, and plfree you could use PLTHROW directly rather than calling plexit. However, if implementation of handling error messages for PLTHROW is difficult/impossible, then I would be content to call plexit everywhere so that there would only be one PLTHROW (within plexit after printing out the error message) in our code as now on this topic branch. * Can you explain why you have implemented all the PLTRY blocks in src/plvpor.c (with presumably the idea in mind to use PLTRY in a lot more places?) Wouldn't it be better to use just one PLTRY block? For example, could you get away with just that one PLTRY block in the library initialization code? If so, the corresponding PLCATCH block there could write out a message saying you reached that catch block (to confirm that the whole exception handling system is fundamentally working within our C library), and then exit (for now) in just that one place rather than doing nothing about whatever PLplot library error was detected. * I suggest you protect your exception handling code (but not the separate topic branch code concerning pl* versions of malloc, calloc, realloc, and free) with an PL_IMPLEMENT_EXCEPTION_HANDLING macro that is controlled (using #cmakedefine in plplot_config.h.in) by a cmake option in cmake/modules/plplot.cmake of the same name which defaults to OFF and which is self-documented as highly experimental. Once you do this, then it makes it possible to merge this topic quite rapidly to master (to facilitate further collaborative development of our exception handling system) since the user would have to go out of their way to set -DPL_IMPLEMENT_EXCEPTION_HANDLING=ON once they found PL_IMPLEMENT_EXCEPTION_HANDLING in CMakeCache.txt. And that file includes the self-documentation of each option so it should be obvious to such users that turning that option ON is highly experimental. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Phil R. <p.d...@gm...> - 2017-07-17 10:32:04
Attachments:
0001-Added-functions-that-will-record-memory-allocations-.patch
0002-Part-way-through-plmalloc-implementation.patch
0003-Removed-plinitializememorylist-and-plfreeall-calls.patch
0004-Fix-issue-with-growing-memorylist.patch
0005-Added-missing.patch
0006-Added-Memory-allocation-function-declarations-to-plp.patch
0007-Added-Exception-handling-macros-functions-variables.patch
0008-Add-some-test-code-for-exception-handling.patch
0009-Fixed-PLENDTRY-PLTHROW-bug.patch
0010-Remove-plexit-calls-from-advancejumpbuffer-and-plini.patch
0011-Added-some-comments-around-PLTRY-PLCATCH-and-PLENDTR.patch
0012-Free-memory-from-jumpbuffer-stack-in-plend1.patch
|
Hi Alan Please find attached a new patch series. I should have fixed the segfault issue. Note I have done some rebasing so you will probably need to create new branches. I think there is a memory leak somewhere. I will look into this. But you may have something you can play with to test a bit better now. I have added specific comments below. > I used "git am" apply your first two commits (which should be handled > as a separate topic) cleanly to a private topic branch I created here > called "memory_allocation". That clean result was based on the tip of the > current > master branch. That's good, as I said I have quite old source and didn't want to risk introducing any changes in the build system that may have cost me time as I only had a few hours to set this up and send it to you. I will update once I get chance. > > I similarly applied your four commits cleanly to a private topic branch > which > I call "exception_handling". i.e., I assumed for this topic branch > the memory_allocation topic would be merged before the exception > handling part of your commits, although I doubt that would be actually > necessary, i.e., both these topics are completely independent of each > other. Yes, the memory management is independent of exception handling, but exception handling requires memory management to avoid memory leaks. > > I extensively tested your "memory_allocation" topic as follows. > > I configured cmake with the option -DVALGRIND_ALL_TESTS=ON and built > the test_c_psc target (which builds all our c standard examples > for the psc device and runs them all using valgrind). Those > valgrind reports were perfect, i.e., the list of such > reports without the phrases "no leaks are possible" and "0 errors from 0 > contexts") is empty as can be seen from > > software@raven> ls examples/*.log > examples/valgrind.x00c.psc.log examples/valgrind.x11c.psc.log > examples/valgrind.x22c.psc.log > examples/valgrind.x01c.psc.log examples/valgrind.x12c.psc.log > examples/valgrind.x23c.psc.log > examples/valgrind.x02c.psc.log examples/valgrind.x13c.psc.log > examples/valgrind.x24c.psc.log > examples/valgrind.x03c.psc.log examples/valgrind.x14c.psc.log > examples/valgrind.x25c.psc.log > examples/valgrind.x04c.psc.log examples/valgrind.x15c.psc.log > examples/valgrind.x26c.psc.log > examples/valgrind.x05c.psc.log examples/valgrind.x16c.psc.log > examples/valgrind.x27c.psc.log > examples/valgrind.x06c.psc.log examples/valgrind.x17c.psc.log > examples/valgrind.x28c.psc.log > examples/valgrind.x07c.psc.log examples/valgrind.x18c.psc.log > examples/valgrind.x29c.psc.log > examples/valgrind.x08c.psc.log examples/valgrind.x19c.psc.log > examples/valgrind.x30c.psc.log > examples/valgrind.x09c.psc.log examples/valgrind.x20c.psc.log > examples/valgrind.x31c.psc.log > examples/valgrind.x10c.psc.log examples/valgrind.x21c.psc.log > examples/valgrind.x33c.psc.log > > and the empty results from > > software@raven> grep -L "no leaks are possible" > examples/valgrind.x??c.psc.log > software@raven> grep -L "0 errors from 0 contexts" > examples/valgrind.x??c.psc.log This is not surprising. I Don't think the plmalloc or plrealloc functions are actually used anywhere yet :-D > > Furthermore, I like what you have done with plmalloc, plrealloc, and > plfree where if inside those routines malloc, realloc, or free > respectively fail, plexit is called with appropriate error message. > Assuming, then you want to replace all our current direct calls to > malloc, realloc and free with these pl* variants, this solves a major > problem we have which was all the wide range of different treatments > of how malloc, realloc and free errors were checked, i.e., in the best > cases plexit is called, but more usually no error checking is done at > all! Not quite. Any pointers which we store in the plstream must use the original malloc, otherwise they will be freed as soon as the API call exits. There are two options here. 1) Rely on developers to choose malloc or plmalloc as needed. But when I looked through the code, it wasn't always obvious because there are some places where memory is allocated in one bit of code, then the assignment of the pointer in the plstream occurs somewhere else. I think this is quite messy. 2) Have plfreeall check that the memory allocations do not correspond to the pointers in plstream. This only requires that developers modify plfreeall if they add a new heap pointer to plstream. it may be that this is easy to miss as it would be done infrequently, but it would probably very quickly generate a segfault or valgrind would spot it easily. My preference is I think for 2, but I welcome comments. > > So this is a most encouraging result. > > However, there are some issues with this topic branch that I noticed > > * This approach should be extended to calloc as well. This is my lack of C (vs C++ knowledge). I didn't know calloc existed. I'll add it. > > * You are not finished, e.g., I notice many direct calls to malloc > still within our code base. I haven't even started the replacement of malloc with plmalloc. > > * The following two compile warnings need to be addressed: > > [ 18%] Building C object src/CMakeFiles/plplot.dir/plcore.c.o > cd /home/software/plplot/HEAD/build_dir/src && /usr/lib/ccache/cc > -DPLPLOT_HAVE_CONFIG_H -DUSINGDLL -Dplplot_EXPORTS -I/home/ > software/plplot/HEAD/plplot.git/include > -I/home/software/plplot/HEAD/plplot.git/lib/qsastime > -I/home/software/plplot/HEAD/buil > d_dir -I/home/software/plplot/HEAD/build_dir/include -O3 > -fvisibility=hidden -Wuninitialized -fPIC -I/usr/include -o CMake > Files/plplot.dir/plcore.c.o -c > /home/software/plplot/HEAD/plplot.git/src/plcore.c > /home/software/plplot/HEAD/plplot.git/src/plcore.c: In function > ‘plinitialisememorylist’: > /home/software/plplot/HEAD/plplot.git/src/plcore.c:117:9: warning: ignoring > return value of ‘realloc’, declared with attribute warn_unused_result > [-Wunused-result] > realloc( pls->memorylist, n * sizeof ( void* ) ); > ^ > /home/software/plplot/HEAD/plplot.git/src/plcore.c: In function ‘plmalloc’: > /home/software/plplot/HEAD/plplot.git/src/plcore.c:140:9: warning: ignoring > return value of ‘realloc’, declared with attribute warn_unused_result > [-Wunused-result] > realloc( pls->memorylist, n * sizeof ( void* ) ); > ^ > Fixed > * Once you feel you have completed development of this private topic > branch, then you should test it like I did above (assuming you have > good access to a Linux box or I could do that testing if you like > when the time comes), and assuming all the valgrind results are > still clean, then you should merge it to the master branch > completely independent of what happens with your exception handling > topic branch. > > That finishes the memory allocation topic, and I now want to move > on to the exception handling topic. > > For that topic branch I built the x00c and ps targets OK (except for > the memory allocation branchh code warnings I mentioned above), but > that result segfaulted at run time, e.g., > > software@raven> examples/c/x00c -dev psc -o test.psc > Segmentation fault Fixed > > Here are the valgrind results for this important memory management issue > > software@raven> valgrind examples/c/x00c -dev psc -o test.psc > ==24594== Memcheck, a memory error detector > ==24594== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. > ==24594== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info > ==24594== Command: examples/c/x00c -dev psc -o test.psc > ==24594== ==24594== Conditional jump or move depends on uninitialised > value(s) > ==24594== at 0x53F1EE5: longjmp (longjmp.c:32) > ==24594== by 0x4E840B0: c_plenv (in > /home/software/plplot/HEAD/build_dir/src/libplplot.so.14.0.0) > ==24594== by 0x400858: main (in > /home/software/plplot/HEAD/build_dir/examples/c/x00c) > ==24594== ==24594== Warning: client switching stacks? SP change: > 0xffefffdc8 --> 0x3d8d115408920055 > ==24594== to suppress, use: --max-stackframe=4435220191945818765 or > greater > ==24594== Use of uninitialised value of size 8 > ==24594== at 0x53F1F52: __longjmp (__longjmp.S:60) > ==24594== ==24594== Jump to the invalid address stated on the next line > ==24594== at 0x3D8D115408920055: ??? > ==24594== Address 0x3d8d115408920055 is not stack'd, malloc'd or (recently) > free'd > ==24594== ==24594== ==24594== Process terminating with default action of > signal 11 (SIGSEGV) > ==24594== Bad permissions for mapped region at address 0x3D8D115408920055 > ==24594== at 0x3D8D115408920055: ??? > ==24594== Use of uninitialised value of size 8 > ==24594== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) > ==24594== ==24594== Invalid write of size 8 > ==24594== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) > ==24594== Address 0x3d8d11540892004d is not stack'd, malloc'd or (recently) > free'd > ==24594== ==24594== ==24594== Process terminating with default action of > signal 11 (SIGSEGV) > ==24594== General Protection Fault > ==24594== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) > ==24594== ==24594== HEAP SUMMARY: > ==24594== in use at exit: 76,682 bytes in 288 blocks > ==24594== total heap usage: 360 allocs, 72 frees, 124,441 bytes allocated > ==24594== ==24594== LEAK SUMMARY: > ==24594== definitely lost: 0 bytes in 0 blocks > ==24594== indirectly lost: 0 bytes in 0 blocks > ==24594== possibly lost: 0 bytes in 0 blocks > ==24594== still reachable: 76,682 bytes in 288 blocks > ==24594== suppressed: 0 bytes in 0 blocks > ==24594== Rerun with --leak-check=full to see details of leaked memory > ==24594== ==24594== For counts of detected and suppressed errors, rerun > with: -v > ==24594== Use --track-origins=yes to see where uninitialised values come > from > ==24594== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0) > Segmentation fault I think at least some of this is due to the realloc issue above. Hopefully some or all of this is fixed now. I would welcome another valgrind summary. > > So you need to address this before we can do any extensive run-time testing > of your exception handling ideas. Of course, you did warn us that > this code was quite preliminary so issues like above are to be > expected. Furthermore, in my view, discussion of real code developers > can try for themselves always beats discussion of code that doesn't > exist yet. So I very much appreciate you supplying us with this > preliminary exception handling code to focus our further discussion. > > I do have some general points to make from just looking at the broad > picture of what you are trying to accomplish here with the exception > handling macros PLTRY, PLCATCH, PLENDTRY, and PLTHROW. > > * There is currently no ability for PLTHROW to propagate an error message. > Is > that a fundamental limitation of setjmp/longjmp based C exception > handling or could that feature be easily implemented? If the latter, > then that gives us the option to use PLTHROW wherever errors occur > rather than always calling plexit which then prints the error > message before it uses PLTHROW. So for example in plmalloc, > plcalloc, plrealloc, and plfree you could use PLTHROW directly > rather than calling plexit. However, if implementation of handling > error messages for PLTHROW is difficult/impossible, then I would be > content to call plexit everywhere so that there would only be one > PLTHROW (within plexit after printing out the error message) in our > code as now on this topic branch. We could throw an error code, but I don't think an error message. However, I think it would be better to put an error message/code in the plstream. This would need to be on the heap rather than the stack otherwise it would be lost in the stack unwinding. I think we should maintain a function like plerror that would put the error code/message onto the stream, but maybe renamed to be called something like plthrow > > * Can you explain why you have implemented all the PLTRY blocks > in src/plvpor.c (with presumably the idea in mind to use PLTRY > in a lot more places?) Wouldn't it be better to use just one > PLTRY block? For example, could you get away with just that one PLTRY > block in the library initialization code? If so, the corresponding > PLCATCH block there could write out a message saying you reached > that catch block (to confirm that the whole exception handling system is > fundamentally working within our C library), and then exit (for now) > in just that one place rather than doing nothing about whatever > PLplot library error was detected. I didn't check the API, but I thought that the naming convention was that every function beginning c_ is part of the public API. Is this not correct? EVERY public API function would need a PLTRY block. I only chose this file because it contains plenv, which was one of the first API functions called by x00c > > * I suggest you protect your exception handling code (but not the > separate topic branch code concerning pl* versions of malloc, > calloc, realloc, and free) with an PL_IMPLEMENT_EXCEPTION_HANDLING > macro that is controlled (using #cmakedefine in plplot_config.h.in) > by a cmake option in cmake/modules/plplot.cmake of the same name > which defaults to OFF and which is self-documented as highly > experimental. Once you do this, then it makes it possible to merge > this topic quite rapidly to master (to facilitate further > collaborative development of our exception handling system) since > the user would have to go out of their way to set > -DPL_IMPLEMENT_EXCEPTION_HANDLING=ON once they found > PL_IMPLEMENT_EXCEPTION_HANDLING in CMakeCache.txt. And that file > includes the self-documentation of each option so it should be > obvious to such users that turning that option ON is highly > experimental. Would you be happy to add this variable to the build system for me? Your cmake knowledge is much better than mine. Or do I just use that macro and adding of -DPL_IMPLEMENT_EXCEPTION_HANDLING=ON to the command line automagically generate the equivalent #define. > > Alan > __________________________ > Alan W. Irwin > > Astronomical research affiliation with Department of Physics and Astronomy, > University of Victoria (astrowww.phys.uvic.ca). > > Programming affiliations with the FreeEOS equation-of-state > implementation for stellar interiors (freeeos.sf.net); the Time > Ephemerides project (timeephem.sf.net); PLplot scientific plotting > software package (plplot.sf.net); the libLASi project > (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); > and the Linux Brochure Project (lbproject.sf.net). > __________________________ > > Linux-powered Science > __________________________ |
From: Phil R. <p.d...@gm...> - 2017-07-17 12:09:50
|
Just a note to say that as far as I can tell from the Visual Studio memory tools, it appears that the memory leak is wxWidgets related and not to do with the exception testing. The actual data in the leaked memory appears to be class information text for wxWidgets dynamic casting of pointers and occurs when using the null device, so I have a feeling it is a wxWidgets bug, not a plplot bug. I'd be interested to know what you see on Linux Alan. Phil On 17 July 2017 at 11:31, Phil Rosenberg <p.d...@gm...> wrote: > Hi Alan > Please find attached a new patch series. > > I should have fixed the segfault issue. Note I have done some rebasing > so you will probably need to create new branches. I think there is a > memory leak somewhere. I will look into this. > > But you may have something you can play with to test a bit better now. > I have added specific comments below. > > >> I used "git am" apply your first two commits (which should be handled >> as a separate topic) cleanly to a private topic branch I created here >> called "memory_allocation". That clean result was based on the tip of the >> current >> master branch. > > That's good, as I said I have quite old source and didn't want to risk > introducing any changes in the build system that may have cost me time > as I only had a few hours to set this up and send it to you. I will > update once I get chance. > >> >> I similarly applied your four commits cleanly to a private topic branch >> which >> I call "exception_handling". i.e., I assumed for this topic branch >> the memory_allocation topic would be merged before the exception >> handling part of your commits, although I doubt that would be actually >> necessary, i.e., both these topics are completely independent of each >> other. > > Yes, the memory management is independent of exception handling, but > exception handling requires memory management to avoid memory leaks. > >> >> I extensively tested your "memory_allocation" topic as follows. >> >> I configured cmake with the option -DVALGRIND_ALL_TESTS=ON and built >> the test_c_psc target (which builds all our c standard examples >> for the psc device and runs them all using valgrind). Those >> valgrind reports were perfect, i.e., the list of such >> reports without the phrases "no leaks are possible" and "0 errors from 0 >> contexts") is empty as can be seen from >> >> software@raven> ls examples/*.log >> examples/valgrind.x00c.psc.log examples/valgrind.x11c.psc.log >> examples/valgrind.x22c.psc.log >> examples/valgrind.x01c.psc.log examples/valgrind.x12c.psc.log >> examples/valgrind.x23c.psc.log >> examples/valgrind.x02c.psc.log examples/valgrind.x13c.psc.log >> examples/valgrind.x24c.psc.log >> examples/valgrind.x03c.psc.log examples/valgrind.x14c.psc.log >> examples/valgrind.x25c.psc.log >> examples/valgrind.x04c.psc.log examples/valgrind.x15c.psc.log >> examples/valgrind.x26c.psc.log >> examples/valgrind.x05c.psc.log examples/valgrind.x16c.psc.log >> examples/valgrind.x27c.psc.log >> examples/valgrind.x06c.psc.log examples/valgrind.x17c.psc.log >> examples/valgrind.x28c.psc.log >> examples/valgrind.x07c.psc.log examples/valgrind.x18c.psc.log >> examples/valgrind.x29c.psc.log >> examples/valgrind.x08c.psc.log examples/valgrind.x19c.psc.log >> examples/valgrind.x30c.psc.log >> examples/valgrind.x09c.psc.log examples/valgrind.x20c.psc.log >> examples/valgrind.x31c.psc.log >> examples/valgrind.x10c.psc.log examples/valgrind.x21c.psc.log >> examples/valgrind.x33c.psc.log >> >> and the empty results from >> >> software@raven> grep -L "no leaks are possible" >> examples/valgrind.x??c.psc.log >> software@raven> grep -L "0 errors from 0 contexts" >> examples/valgrind.x??c.psc.log > > This is not surprising. I Don't think the plmalloc or plrealloc > functions are actually used anywhere yet :-D > >> >> Furthermore, I like what you have done with plmalloc, plrealloc, and >> plfree where if inside those routines malloc, realloc, or free >> respectively fail, plexit is called with appropriate error message. >> Assuming, then you want to replace all our current direct calls to >> malloc, realloc and free with these pl* variants, this solves a major >> problem we have which was all the wide range of different treatments >> of how malloc, realloc and free errors were checked, i.e., in the best >> cases plexit is called, but more usually no error checking is done at >> all! > > Not quite. Any pointers which we store in the plstream must use the > original malloc, otherwise they will be freed as soon as the API call > exits. > There are two options here. > 1) Rely on developers to choose malloc or plmalloc as needed. But when > I looked through the code, it wasn't always obvious because there are > some places where memory is allocated in one bit of code, then the > assignment of the pointer in the plstream occurs somewhere else. I > think this is quite messy. > 2) Have plfreeall check that the memory allocations do not correspond > to the pointers in plstream. This only requires that developers modify > plfreeall if they add a new heap pointer to plstream. it may be that > this is easy to miss as it would be done infrequently, but it would > probably very quickly generate a segfault or valgrind would spot it > easily. > > My preference is I think for 2, but I welcome comments. > >> >> So this is a most encouraging result. >> >> However, there are some issues with this topic branch that I noticed >> >> * This approach should be extended to calloc as well. > > This is my lack of C (vs C++ knowledge). I didn't know calloc existed. > I'll add it. > >> >> * You are not finished, e.g., I notice many direct calls to malloc >> still within our code base. > > I haven't even started the replacement of malloc with plmalloc. > >> >> * The following two compile warnings need to be addressed: >> >> [ 18%] Building C object src/CMakeFiles/plplot.dir/plcore.c.o >> cd /home/software/plplot/HEAD/build_dir/src && /usr/lib/ccache/cc >> -DPLPLOT_HAVE_CONFIG_H -DUSINGDLL -Dplplot_EXPORTS -I/home/ >> software/plplot/HEAD/plplot.git/include >> -I/home/software/plplot/HEAD/plplot.git/lib/qsastime >> -I/home/software/plplot/HEAD/buil >> d_dir -I/home/software/plplot/HEAD/build_dir/include -O3 >> -fvisibility=hidden -Wuninitialized -fPIC -I/usr/include -o CMake >> Files/plplot.dir/plcore.c.o -c >> /home/software/plplot/HEAD/plplot.git/src/plcore.c >> /home/software/plplot/HEAD/plplot.git/src/plcore.c: In function >> ‘plinitialisememorylist’: >> /home/software/plplot/HEAD/plplot.git/src/plcore.c:117:9: warning: ignoring >> return value of ‘realloc’, declared with attribute warn_unused_result >> [-Wunused-result] >> realloc( pls->memorylist, n * sizeof ( void* ) ); >> ^ >> /home/software/plplot/HEAD/plplot.git/src/plcore.c: In function ‘plmalloc’: >> /home/software/plplot/HEAD/plplot.git/src/plcore.c:140:9: warning: ignoring >> return value of ‘realloc’, declared with attribute warn_unused_result >> [-Wunused-result] >> realloc( pls->memorylist, n * sizeof ( void* ) ); >> ^ >> > > Fixed > >> * Once you feel you have completed development of this private topic >> branch, then you should test it like I did above (assuming you have >> good access to a Linux box or I could do that testing if you like >> when the time comes), and assuming all the valgrind results are >> still clean, then you should merge it to the master branch >> completely independent of what happens with your exception handling >> topic branch. >> >> That finishes the memory allocation topic, and I now want to move >> on to the exception handling topic. >> >> For that topic branch I built the x00c and ps targets OK (except for >> the memory allocation branchh code warnings I mentioned above), but >> that result segfaulted at run time, e.g., >> >> software@raven> examples/c/x00c -dev psc -o test.psc >> Segmentation fault > Fixed > >> >> Here are the valgrind results for this important memory management issue >> >> software@raven> valgrind examples/c/x00c -dev psc -o test.psc >> ==24594== Memcheck, a memory error detector >> ==24594== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. >> ==24594== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info >> ==24594== Command: examples/c/x00c -dev psc -o test.psc >> ==24594== ==24594== Conditional jump or move depends on uninitialised >> value(s) >> ==24594== at 0x53F1EE5: longjmp (longjmp.c:32) >> ==24594== by 0x4E840B0: c_plenv (in >> /home/software/plplot/HEAD/build_dir/src/libplplot.so.14.0.0) >> ==24594== by 0x400858: main (in >> /home/software/plplot/HEAD/build_dir/examples/c/x00c) >> ==24594== ==24594== Warning: client switching stacks? SP change: >> 0xffefffdc8 --> 0x3d8d115408920055 >> ==24594== to suppress, use: --max-stackframe=4435220191945818765 or >> greater >> ==24594== Use of uninitialised value of size 8 >> ==24594== at 0x53F1F52: __longjmp (__longjmp.S:60) >> ==24594== ==24594== Jump to the invalid address stated on the next line >> ==24594== at 0x3D8D115408920055: ??? >> ==24594== Address 0x3d8d115408920055 is not stack'd, malloc'd or (recently) >> free'd >> ==24594== ==24594== ==24594== Process terminating with default action of >> signal 11 (SIGSEGV) >> ==24594== Bad permissions for mapped region at address 0x3D8D115408920055 >> ==24594== at 0x3D8D115408920055: ??? >> ==24594== Use of uninitialised value of size 8 >> ==24594== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) >> ==24594== ==24594== Invalid write of size 8 >> ==24594== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) >> ==24594== Address 0x3d8d11540892004d is not stack'd, malloc'd or (recently) >> free'd >> ==24594== ==24594== ==24594== Process terminating with default action of >> signal 11 (SIGSEGV) >> ==24594== General Protection Fault >> ==24594== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) >> ==24594== ==24594== HEAP SUMMARY: >> ==24594== in use at exit: 76,682 bytes in 288 blocks >> ==24594== total heap usage: 360 allocs, 72 frees, 124,441 bytes allocated >> ==24594== ==24594== LEAK SUMMARY: >> ==24594== definitely lost: 0 bytes in 0 blocks >> ==24594== indirectly lost: 0 bytes in 0 blocks >> ==24594== possibly lost: 0 bytes in 0 blocks >> ==24594== still reachable: 76,682 bytes in 288 blocks >> ==24594== suppressed: 0 bytes in 0 blocks >> ==24594== Rerun with --leak-check=full to see details of leaked memory >> ==24594== ==24594== For counts of detected and suppressed errors, rerun >> with: -v >> ==24594== Use --track-origins=yes to see where uninitialised values come >> from >> ==24594== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0) >> Segmentation fault > > I think at least some of this is due to the realloc issue above. > Hopefully some or all of this is fixed now. I would welcome another > valgrind summary. > >> >> So you need to address this before we can do any extensive run-time testing >> of your exception handling ideas. Of course, you did warn us that >> this code was quite preliminary so issues like above are to be >> expected. Furthermore, in my view, discussion of real code developers >> can try for themselves always beats discussion of code that doesn't >> exist yet. So I very much appreciate you supplying us with this >> preliminary exception handling code to focus our further discussion. >> >> I do have some general points to make from just looking at the broad >> picture of what you are trying to accomplish here with the exception >> handling macros PLTRY, PLCATCH, PLENDTRY, and PLTHROW. >> >> * There is currently no ability for PLTHROW to propagate an error message. >> Is >> that a fundamental limitation of setjmp/longjmp based C exception >> handling or could that feature be easily implemented? If the latter, >> then that gives us the option to use PLTHROW wherever errors occur >> rather than always calling plexit which then prints the error >> message before it uses PLTHROW. So for example in plmalloc, >> plcalloc, plrealloc, and plfree you could use PLTHROW directly >> rather than calling plexit. However, if implementation of handling >> error messages for PLTHROW is difficult/impossible, then I would be >> content to call plexit everywhere so that there would only be one >> PLTHROW (within plexit after printing out the error message) in our >> code as now on this topic branch. > > We could throw an error code, but I don't think an error message. > However, I think it would be better to put an error message/code in > the plstream. This would need to be on the heap rather than the stack > otherwise it would be lost in the stack unwinding. I think we should > maintain a function like plerror that would put the error code/message > onto the stream, but maybe renamed to be called something like plthrow > >> >> * Can you explain why you have implemented all the PLTRY blocks >> in src/plvpor.c (with presumably the idea in mind to use PLTRY >> in a lot more places?) Wouldn't it be better to use just one >> PLTRY block? For example, could you get away with just that one PLTRY >> block in the library initialization code? If so, the corresponding >> PLCATCH block there could write out a message saying you reached >> that catch block (to confirm that the whole exception handling system is >> fundamentally working within our C library), and then exit (for now) >> in just that one place rather than doing nothing about whatever >> PLplot library error was detected. > > I didn't check the API, but I thought that the naming convention was > that every function beginning c_ is part of the public API. Is this > not correct? EVERY public API function would need a PLTRY block. I > only chose this file because it contains plenv, which was one of the > first API functions called by x00c >> >> * I suggest you protect your exception handling code (but not the >> separate topic branch code concerning pl* versions of malloc, >> calloc, realloc, and free) with an PL_IMPLEMENT_EXCEPTION_HANDLING >> macro that is controlled (using #cmakedefine in plplot_config.h.in) >> by a cmake option in cmake/modules/plplot.cmake of the same name >> which defaults to OFF and which is self-documented as highly >> experimental. Once you do this, then it makes it possible to merge >> this topic quite rapidly to master (to facilitate further >> collaborative development of our exception handling system) since >> the user would have to go out of their way to set >> -DPL_IMPLEMENT_EXCEPTION_HANDLING=ON once they found >> PL_IMPLEMENT_EXCEPTION_HANDLING in CMakeCache.txt. And that file >> includes the self-documentation of each option so it should be >> obvious to such users that turning that option ON is highly >> experimental. > > Would you be happy to add this variable to the build system for me? > Your cmake knowledge is much better than mine. Or do I just use that > macro and adding of -DPL_IMPLEMENT_EXCEPTION_HANDLING=ON to the > command line automagically generate the equivalent #define. >> >> Alan >> __________________________ >> Alan W. Irwin >> >> Astronomical research affiliation with Department of Physics and Astronomy, >> University of Victoria (astrowww.phys.uvic.ca). >> >> Programming affiliations with the FreeEOS equation-of-state >> implementation for stellar interiors (freeeos.sf.net); the Time >> Ephemerides project (timeephem.sf.net); PLplot scientific plotting >> software package (plplot.sf.net); the libLASi project >> (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); >> and the Linux Brochure Project (lbproject.sf.net). >> __________________________ >> >> Linux-powered Science >> __________________________ |
From: Alan W. I. <ir...@be...> - 2017-07-17 21:59:12
|
On 2017-07-17 11:31+0100 Phil Rosenberg wrote: > Hi Alan > Please find attached a new patch series. > > I should have fixed the segfault issue. Note I have done some rebasing > so you will probably need to create new branches. I think there is a > memory leak somewhere. I will look into this. > > But you may have something you can play with to test a bit better now. > I have added specific comments below. > Hi Phil: Thanks for your reply with substantial updates to both branches. To make this process more convenient for me _next time_ would you be willing to store your memory allocation patches in one tarball (e.g., your first 6 patches this time) and your exception handling patches (e.g., your remaining 6 patches with the first 6 excluded) in another tarball? This time I have already split the 12 patches into two separate locations, see below, but that is a lot of error-prone cutting and pasting of filenames so two tarballs next time would be much better. [...] > Yes, the memory management is independent of exception handling, but > exception handling requires memory management to avoid memory leaks. OK. See how I handled these patches below. @Others: This is how I applied Phil's patches. git checkout master # New branch git checkout -b memory_allocation2 # Specific location where I stored the first 6 of Phil's series that had # to do with memory allocation. This cat trick is something I just # discovered, and it makes it much easier to apply a series of patches # like these. software@raven> cat ~irwin/Phil.Rosenberg/20170717/memory_allocation/* |git am Applying: Added functions that will record memory allocations in the PLStream Applying: Part way through plmalloc implementation Applying: Removed plinitializememorylist and plfreeall calls Applying: Fix issue with growing memorylist Applying: Added missing ; Applying: Added Memory allocation function declarations to plplotP.h git checkout master # New branch git checkout -b exception_handler2 # Apply patches in memory_allocation2 since this branch depends on # that one being applied first. git rebase memory_allocation2 # Specific location, etc. software@raven> cat ~irwin/Phil.Rosenberg/20170717/exception_handler/* |git am Applying: Added Exception handling macros/functions/variables Applying: Add some test code for exception handling Applying: Fixed PLENDTRY/PLTHROW bug Applying: Remove plexit calls from advancejumpbuffer and plinitialisememorylist Applying: Added some comments around PLTRY, PLCATCH and PLENDTRY Applying: Free memory from jumpbuffer stack in plend1 @Phil: Again all seems clean with applying your commits for both these topic branches. > This [good valgrind results for memory_allocation branch] is not surprising. I Don't think the plmalloc or plrealloc > functions are actually used anywhere yet :-D I thought that might be the case, but these good valgrind results for all C examples are a good benchmark, and, in fact, that good benchmark has immediately paid off as a comparison for a bad result for thise second version of the memory allocation topic, see below. > >> >> Furthermore, I like what you have done with plmalloc, plrealloc, and >> plfree where if inside those routines malloc, realloc, or free >> respectively fail, plexit is called with appropriate error message. >> Assuming, then you want to replace all our current direct calls to >> malloc, realloc and free with these pl* variants, this solves a major >> problem we have which was all the wide range of different treatments >> of how malloc, realloc and free errors were checked, i.e., in the best >> cases plexit is called, but more usually no error checking is done at >> all! > > Not quite. Any pointers which we store in the plstream must use the > original malloc, otherwise they will be freed as soon as the API call > exits. > There are two options here. > 1) Rely on developers to choose malloc or plmalloc as needed. But when > I looked through the code, it wasn't always obvious because there are > some places where memory is allocated in one bit of code, then the > assignment of the pointer in the plstream occurs somewhere else. I > think this is quite messy. > 2) Have plfreeall check that the memory allocations do not correspond > to the pointers in plstream. This only requires that developers modify > plfreeall if they add a new heap pointer to plstream. it may be that > this is easy to miss as it would be done infrequently, but it would > probably very quickly generate a segfault or valgrind would spot it > easily. > > My preference is I think for 2, but I welcome comments. I suggest instead that you create a list (which will, of course, need to be maintained) of plstream heap pointers where a pointer to that list is stored in plstream. The advantage of this location for the list is it can then be used not only in plfreeall, but also in plmalloc, plcalloc, etc. to do the right thing in those functions whenever you are dealing with a plstream heap pointer. Which, in turn, would allow us to always use plmalloc, plcalloc, etc., everywhere in our C library. Which is much less error prone (other than the maintenance required for keeping that list updated, as you point out above) since developers then know they must always use the pl* versions of memory management calls while letting all those different pl* memory allocation and freeing functions deal with whether the heap pointers are in plstream or not. With regard to testing the above memory_allocation2, I repeated the same test as before (used the -DVALGRIND_ALL_TESTS=ON cmake option, built the test_c_psc target, and checked for any imperfect valgrind reports). Examples 00 through 07 were fine, but example 08 segfaulted. I have attached the associated valgrind report to help you to debug this issue. That finishes the memory allocation topic, and I now want to move on to the exception handling topic. Since this topic branch depends on the memory allocation branch, and there is currently a showstopper segfault in that branch, I did no tests of this exception handling topic branch other than the one above that showed these commits applied cleanly. However, I do have some further responses to your remarks about this branch below. [...] > Hopefully some or all of this [segfault in the prior version of this branch] is fixed now. I would welcome another > valgrind summary. Current problem is in memory_allocation branch, see above. >> * There is currently no ability for PLTHROW to propagate an error message. >> Is that a fundamental limitation of setjmp/longjmp based C exception >> handling or could that feature be easily implemented? If the latter, >> then that gives us the option to use PLTHROW wherever errors occur >> rather than always calling plexit which then prints the error >> message before it uses PLTHROW. So for example in plmalloc, >> plcalloc, plrealloc, and plfree you could use PLTHROW directly >> rather than calling plexit. However, if implementation of handling >> error messages for PLTHROW is difficult/impossible, then I would be >> content to call plexit everywhere so that there would only be one >> PLTHROW (within plexit after printing out the error message) in our >> code as now on this topic branch. > > We could throw an error code, but I don't think an error message. > However, I think it would be better to put an error message/code in > the plstream. Agreed. I personally don't like error codes that much. It takes me back to the 70's when I first had to deal with IBM "reason codes" <https://www.ibm.com/support/knowledgecenter/en/SS2L7A_5.1.0/doc/ccv-reason-codes.html>. such as the dreaded OOC5 and OOC6. > This would need to be on the heap rather than the stack > otherwise it would be lost in the stack unwinding. I think we should > maintain a function like plerror that would put the error code/message > onto the stream, but maybe renamed to be called something like plthrow Go for it. [out of order] >> * I suggest you protect your exception handling code (but not the >> separate topic branch code concerning pl* versions of malloc, >> calloc, realloc, and free) with an PL_IMPLEMENT_EXCEPTION_HANDLING >> macro that is controlled (using #cmakedefine in plplot_config.h.in) >> by a cmake option in cmake/modules/plplot.cmake of the same name >> which defaults to OFF and which is self-documented as highly >> experimental. Once you do this, then it makes it possible to merge >> this topic quite rapidly to master (to facilitate further >> collaborative development of our exception handling system) since >> the user would have to go out of their way to set >> -DPL_IMPLEMENT_EXCEPTION_HANDLING=ON once they found >> PL_IMPLEMENT_EXCEPTION_HANDLING in CMakeCache.txt. And that file >> includes the self-documentation of each option so it should be >> obvious to such users that turning that option ON is highly >> experimental. > > Would you be happy to add this variable to the build system for me? > Your cmake knowledge is much better than mine. Or do I just use that > macro and adding of -DPL_IMPLEMENT_EXCEPTION_HANDLING=ON to the > command line automagically generate the equivalent #define. This is the way that we implement most of our optional C macros so you should be able to deal with this yourself either following the above description or following the pattern of any of the other macros set in plplot_config.h.in. But if you prefer not to do that, than you could leave the CMake part of this to me, and as a temporary measure place #define PL_IMPLEMENT_EXCEPTION_HANDLING in plplot.h and then do the more painstaking part of this task which is to sprinkle #ifdef PL_IMPLEMENT_EXCEPTION_HANDLING and appropriate #else and #endif lines throughout wherever your changes have added to the existing code. The goal here, of course, is the net result of your changes is exactly equivalent to the memory allocation branch if PL_IMPLEMENT_EXCEPTION_HANDLING is not #defined, and I will follow up by creating a CMake option to insure this macro is not #defined by default unless you decide you want to do that yourself following the above instructions. > >> >> * Can you explain why you have implemented all the PLTRY blocks >> in src/plvpor.c (with presumably the idea in mind to use PLTRY >> in a lot more places?) Wouldn't it be better to use just one >> PLTRY block? For example, could you get away with just that one PLTRY >> block in the library initialization code? If so, the corresponding >> PLCATCH block there could write out a message saying you reached >> that catch block (to confirm that the whole exception handling system is >> fundamentally working within our C library), and then exit (for now) >> in just that one place rather than doing nothing about whatever >> PLplot library error was detected. > > I didn't check the API, but I thought that the naming convention was > that every function beginning c_ is part of the public API. Is this > not correct? EVERY public API function would need a PLTRY block. I > only chose this file because it contains plenv, which was one of the > first API functions called by x00c You are correct, everything in c_* is in the public API. However, I still don't see the need for PLTRY and PLCATCH blocks for each of our public API's. I assume this gets into the question of how you are going to propagate errors outside the library. I cannot remember the details of your previous discussion of that topic, but I was left with the impression you were spoiled for choice. Furthermore, if you were planning to propagate the error outside the PLplot library (to, e.g., x00c.c) for each of the currently empty PLCATCH blocks in src/plvpor.c, I claim could instead be done at one central location in the plplot library (i.e., in a PLCATCH block in the library initialization code that also shuts down most/all of the library from that one central location). I think it is a no-brainer to handle library errors centrally like that if possible since it is less intrusive and therefore much less editing work and much less error prone, but if I am missing something (e.g., that is not possible with any of the propagation methods you had in mind), then the ideal way to demonstrate that is to include an example in your proof-of-concept of propagating a PLplot library error to x00c.c. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2017-07-17 22:08:02
Attachments:
valgrind.x08c.psc.log.gz
|
On 2017-07-17 14:59-0700 Alan W. Irwin wrote: > With regard to testing the above memory_allocation2, I repeated the > same test as before (used the -DVALGRIND_ALL_TESTS=ON cmake option, > built the test_c_psc target, and checked for any imperfect valgrind > reports). Examples 00 through 07 were fine, but example 08 segfaulted. > I have attached the associated valgrind report to help you to debug > this issue. Oops! This time I really do attach that valgrind log. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Phil R. <p.d...@gm...> - 2017-07-22 21:47:24
Attachments:
plmalloc-PLTRY.tar
|
Hi Alan I have sorted the issue with example 8. It turned out I had put some plmalloc calls in there and forgotten about them. Attached is a new patch series in a tarball. I don't see a huge need to keep two separate branches any more. In fact I just tried to disentangle my latest few commits into two branches using git rebase -i and endured a whole world of pain. So I won't be returning to the separate memory allocation functions branch any more. In the attached patch series, patches 1-11 just have memory allocation functions. After applying just these patches you should have source that builds and runs, with the new functions defined but not used. patches 12 and onward introduce the PLTRY, PLCATCH and PLENDTRY blocks and add them into the API calls in plcore.c, plcont.c and plvpor.c. I have pretty well tested plcont.c calls, but not plcore or plvpor calls. You asked why every API call required a PLTRY block. You are correct that actually they don't. Every API call that may result in a call to plexit needs a PLTRY block. However, if we just have a policy that every API call must have a PLTRY block then it makes it easy to ensure nothing gets missed. But perhaps this is overkill. A number of functions in plcore.c, particularly the "get" type ones are very simple and would never call plexit. I'm happy to take advice on where the balance should lie? One other issue at the moment. I currently output a warning when plfree is called on a pointer that is not being managed by the PLStream. At present this will include all plfree calls made in plend/plend1 and can include memory allocated by the contour routines. The reason for this is due to the following chain of events 1) These bits of memory get allocated by plalloc in an API call somewhere (e.g cmap0). 2) On exit from the API call plfreeall gets called 3) plfreeall spots the allocated memory that wasn't freed and checks all the pointers that belong to the PLStream 4) It finds that the pointers belong to PLStream and need to not be freed. It therefore stops tracking them. 5) In plend1 plfree is called on the pointer. plfree cannot see the pointer in the list of tracked pointers so issues a warning. There are a few options for dealing with this, but I haven't decided which to use. For now my apologies for the list of warnings. Thoughts welcome as ever. Phil On 17 July 2017 at 23:07, Alan W. Irwin <ir...@be...> wrote: > On 2017-07-17 14:59-0700 Alan W. Irwin wrote: > >> With regard to testing the above memory_allocation2, I repeated the >> same test as before (used the -DVALGRIND_ALL_TESTS=ON cmake option, >> built the test_c_psc target, and checked for any imperfect valgrind >> reports). Examples 00 through 07 were fine, but example 08 segfaulted. >> I have attached the associated valgrind report to help you to debug >> this issue. > > > Oops! This time I really do attach that valgrind log. > > > Alan > > __________________________ > Alan W. Irwin > > Astronomical research affiliation with Department of Physics and Astronomy, > University of Victoria (astrowww.phys.uvic.ca). > > Programming affiliations with the FreeEOS equation-of-state > implementation for stellar interiors (freeeos.sf.net); the Time > Ephemerides project (timeephem.sf.net); PLplot scientific plotting > software package (plplot.sf.net); the libLASi project > (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); > and the Linux Brochure Project (lbproject.sf.net). > __________________________ > > Linux-powered Science > __________________________ |
From: Alan W. I. <ir...@be...> - 2017-07-23 10:51:02
Attachments:
exception_handler_report.tar.gz
|
On 2017-07-22 22:47+0100 Phil Rosenberg wrote: > Attached is a new patch series in a tarball. I don't see a huge need > to keep two separate branches any more. In fact I just tried to > disentangle my latest few commits into two branches using git rebase > -i and endured a whole world of pain. Hi Phil: Thanks very much for this third version of your exception handler private topic branch. It does sound like you prefer to keep your development environment simpler (at least for now) with just one branch so I will take that approach as well. After unpacking the tarball here in ~irwin/Phil.Rosenberg/20170722/exception_handler/ I did the following: # Create up-to-date master branch git checkout master git fetch git merge --ff-only origin/master # create new private topic branch from that branch git checkout -b exception_handler3 # Apply _all_ your commits to that branch cat ~irwin/Phil.Rosenberg/20170722/exception_handler/*.patch |git am The result here of that last command was the following: Applying: Added functions that will record memory allocations in the PLStream Applying: Part way through plmalloc implementation Applying: Removed plinitializememorylist and plfreeall calls Applying: Fix issue with growing memorylist Applying: Added missing ; Applying: Added Memory allocation function declarations to plplotP.h Applying: Added some further memory management functions and check for PLStream memory /home/software/plplot/HEAD/plplot.git/.git/rebase-apply/patch:104: trailing whitespace. //Cause a pointer to no longer be managed by plplot, warning: 1 line adds whitespace errors. Applying: Fixed some problems with memory management and performed style update. /home/software/plplot/HEAD/plplot.git/.git/rebase-apply/patch:98: trailing whitespace. warning: 1 line adds whitespace errors. Applying: moved memory management functions from plstrm.h to plplotP.h removing duplicate definitions. Applying: Removed c_usplparseopts function, which was pollution from another branch Applying: Removed use of plmalloc and plfree from plcont.c. This was pollution from another branch. Applying: Added Exception handling macros/functions/variables Applying: Add some test code for exception handling Applying: Fixed PLENDTRY/PLTHROW bug Applying: Remove plexit calls from advancejumpbuffer and plinitialisememorylist Applying: Added some comments around PLTRY, PLCATCH and PLENDTRY Applying: Free memory from jumpbuffer stack in plend1 Applying: Added PLTRY blocks to all the api functions in core.c Applying: Added use of plmalloc to the contour code. Applying: Restyle only So all your commits applied cleanly on this fresh and up-to-date topic branch aside from some extremely minor trailing whitespace issues which should be easy for me to fix (but I won't do that yet) after the fact by running the (Unix-tools-only) scripts/remove_trailing_whitespace.sh. I tested that branch by setting the environment variable CFLAGS='-g' (to generate line-number information in valgrind reports), making a fresh configuration of PLplot with the -DVALGRIND_ALL_TESTS=ON cmake option, and running make VERBOSE=1 test_c_psc >& test_c_psc.out Unlike my original good benchmark with this test, the result for this branch was not clean, i.e. test_c_psc.out (collected in the attached tarball) ends with an actual valgrind error for example 14. For historical reasons examples 14, 17, and 29 are done last so because of this error for example 14, examples 17 and 29 were not executed, but all the rest of our standard examples from 00 to 33 (except for 32 which we historically do not test because it has not been propagated to languages other than C so there is no interesting basis of comparison) were run and returned success (a 0 exit code). However, the commands software@raven> grep -L "0 errors from 0 contexts" examples/valgrind.x??c.psc.log examples/valgrind.x11c.psc.log examples/valgrind.x14c.psc.log examples/valgrind.x21c.psc.log software@raven> grep -L "no leaks are possible" examples/valgrind.x??c.psc.log examples/valgrind.x14c.psc.log demostrate there are imperfect valgrind results for examples 11, 14, and 21 and actually looking at those three files (also collected in the attached tarball) show those errors are quite serious in nature, e.g., the initial trouble for all three is an invalid write which is normally quite bad. Furthermore, for example 14 that initial error compounded so badly that valgrind itself exited with an error message (and non-zero exit code). I hope these collected valgrind reports with exact source code line numbers will help you debug all these issues so we can get back to perfect valgrind reports for the next iteration. This is already long enough with an attachment for you to respond to as well so I will place my further response to your additional remarks in a separate e-mail. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2017-07-23 11:27:34
|
Hi Phil: Here is further discussion of what you have said as opposed to the error report I sent you in my previous post concerning your 3rd iteration. Do keep in mind the possibility for the future that the memory allocation part of your commits are worthwhile in their own right and could be finished and landed on master much sooner than the rest of your commits that are concerned with implementation of C exception handling and use of that for handling our errors. I suggested in my big e-mail to you early this week (Mon, 17 Jul 2017 14:59:00 -0700 (PDT)) concerning your second iteration a method of doing this using a list of plstream heap pointers. Please respond to that suggestion. Also please respond to my suggestion (with appropriate code changes while I handle the CMake part) in that e-mail concerning #define PL_IMPLEMENT_EXCEPTION_HANDLING and #ifdef PL_IMPLEMENT_EXCEPTION_HANDLING On 2017-07-22 22:47+0100 Phil Rosenberg wrote: > [in that same e-mail] You asked why every API call required a PLTRY block. You are correct > that actually they don't. Every API call that may result in a call to > plexit needs a PLTRY block. I initially didn't understand this explanation, but after _much_ too long a time considering various scenarios to explain my point of view the light finally dawned. Exception handling does not allow you to transfer control to arbitrary places in the code (a stupid assumption I was making). Instead, it merely allows you to unwind the call graph from the plexit call (in this case that is the only current use of PLTHROW in our source code) to return control to a routine potentially far up the call stack from there such as _any_ of the public API routines (as you said) that have a possible call to plexit somewhere lower down on their call graph. So sorry for my previous noise on this sub-topic, but now we are finally on the same page in this regard, I have some further questions and comments about your implementation of exception handling and how you are using that exception handling to deal with bad errors. * If you miss putting a PLTRY and PLCATCH block in one of those public API cases where a potential call to plexit is somewhere lower on the call graph, then what does the current code do? Does it simply crash (i.e., is that the potential cause of the serious issues I have reported to you in my previous post) or do you handle that internal error case by, e.g., emitting an "internal error: unhandled exception" message followed by an exit? * What happens when an external application calls a "first" public API which has another "second" public API lower down in its call graph and even further down on the call graph of that second API occurs a potential call to plexit? Then if plexit is called that will PLTHROW control to the catch block of that second API rather than the first. So for this particular call graph case for the first API this second API needs to PLTHROW control to that first PLCATCH block rather than simply returning. And, of course, since that second API is sometimes called directly from external applications and libraries without anything higher in the call graph with a PLCATCH block, it cannot PLTHROW all the time without sometimes running into the internal error discussed above. So how are you going to figure out when to PLTHROW in a public API PLCATCH block and when not to PLTHROW? Or is that case already automatically taken care of by your present code (e.g., by each PLTRY macro incrementing a library status flag so that the PLCATCH macro can check that flag to see if there is a PLCATCH block higher in the call graph that it should PLTHROW to)? * Shouldn't your current plexit call plend (like the master branch version does) to shut down the library? See further discussion below concerning this suggestion. * I also suggest you should implement a PLplot library state variable (let's call it plplot_library_status) that is initialized to 0 (good status) when the library is initialized but set to 1 (bad status) by plexit. Then follow up by implementing a c_plstatus() public API that returns the value of library_status and which emits a message such as ("library had bad error so it had to be shut down. To use it again you need to initialize the library by calling one of the plinit family of routines".) Then an external application could call c_plstatus to check on the status of the library after any call to our public API and act accordingly. Or it could be sloppy about that (i.e., our standard C examples might only call plstatus in one example to make sure it works). In which case after the library is shutdown (see above) by an internal call to plexit there would be a blizzard of further plabort (due to plsc->level being 0) and plexit messages after the first one because of that library shutdown. But there would be no call to the dreaded exit routine so there is at least the chance that the external application will survive that sloppiness. And if plexit does not call plend, I think the chances of an external application encountering a showstopper error are much worse. > However, if we just have a policy that > every API call must have a PLTRY block then it makes it easy to ensure > nothing gets missed. But perhaps this is overkill. A number of > functions in plcore.c, particularly the "get" type ones are very > simple and would never call plexit. I'm happy to take advice on where > the balance should lie? > > One other issue at the moment. I currently output a warning when > plfree is called on a pointer that is not being managed by the > PLStream. At present this will include all plfree calls made in > plend/plend1 and can include memory allocated by the contour routines. > The reason for this is due to the following chain of events > 1) These bits of memory get allocated by plalloc in an API call > somewhere (e.g cmap0). > 2) On exit from the API call plfreeall gets called > 3) plfreeall spots the allocated memory that wasn't freed and checks > all the pointers that belong to the PLStream > 4) It finds that the pointers belong to PLStream and need to not be > freed. It therefore stops tracking them. > 5) In plend1 plfree is called on the pointer. plfree cannot see the > pointer in the list of tracked pointers so issues a warning. > > There are a few options for dealing with this, but I haven't decided > which to use. For now my apologies for the list of warnings. > I am frankly out of gas, and this is already a pretty long e-mail so I will leave it to you to decide on the above topics unless someone else here wants to comment. Anyhow, after I get some sleep, I am looking forward to your further comments on (a) my error report for interation 3, and (b) what I have had the energy to discuss above. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2017-07-23 13:00:18
|
P.S. I just reviewed the wikipedia article on call graphs, and it appears I was misusing that term and should have used call chain instead where a call graph for a given library consists of all the potential call chains in the library with each one starting at one of the public API's. So my understanding now is the PLTHROW macro in plexit unwinds the particular call chain that called plexit until it reaches the first routine higher in that call chain where the the call was made in a PLTRY block and transfer control to the code in the corresponding PLCATCH block. @Phil: Do you agree with this explanation of PLTHROW, PLTRY, and PLCATCH or does it need further correction? If that explanation is substantially correct, that one scenario I was discussing could be described as two possible call chains in the call graph with one starting at the first public API traversing through the second public API and eventually traversing to plexit, and the second starting at the second public API and eventually traversing to plexit as well. So the question was does the code need enhancement to deal with those two different possible call chains where in one case the PLCATCH block of the second API needs to PLTHROW to unwind the call chain to the first API PLTRY and transfer control to the corresponding PLCATCH block, and in the other case the second API catch block needs simply to return to the calling routine in the external application. And similarly this change in terminology needs to be applied to my just-previous discussion of other topics related to PLTHROW, PLTRY, and PLCATCH. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2017-07-23 19:36:25
|
On 2017-07-23 06:00-0700 Alan W. Irwin wrote: P.P.S. As I am just discovering, one of the huge problems in this field is terminology. For example, the wikipedia article <https://en.wikipedia.org/wiki/Call_stack> refers immediately to 6 synomyms (ouch!) for call stack with one of those being just plain "stack" which is the terminology used by the Linux man pages for setjmp and longjmp. "Call chain" is not one of those wikipedia-approved synonyms, but I believe I must have picked up that terminology from another article I read (where it meant the same thing). Therefore, since the principal terminology in wikipedia is "call stack", that is much preferable to the "call chain" terminology I used below. But nevertheless, I thought I did pretty well below considering I wrote that at 6 o'clock in the morning after a long night of analysis without sleep. :-) Alan > P.S. I just reviewed the wikipedia article on call graphs, and it > appears I was misusing that term and should have used call chain > instead where a call graph for a given library consists of all the > potential call chains in the library with each one starting at one of > the public API's. > > So my understanding now is the PLTHROW macro in plexit unwinds the > particular call chain that called plexit until it reaches the first > routine higher in that call chain where the > the call was made in a PLTRY block and transfer control to the code in the > corresponding PLCATCH block. > > @Phil: > > Do you agree with this explanation of PLTHROW, PLTRY, and PLCATCH or > does it need further correction? > > If that explanation is substantially correct, that one scenario I was > discussing could be described as two possible call chains in the call > graph with one starting at the first public API traversing through the > second public API and eventually traversing to plexit, and the second > starting at the second public API and eventually traversing to plexit > as well. So the question was does the code need enhancement to deal > with those two different possible call chains where in one case the > PLCATCH block of the second API needs to PLTHROW to unwind the call > chain to the first API PLTRY and transfer control to the corresponding > PLCATCH block, and in the other case the second API catch block needs > simply to return to the calling routine in the external application. > > And similarly this change in terminology needs to be applied to my > just-previous discussion of other topics related to PLTHROW, PLTRY, > and PLCATCH. > > Alan > __________________________ > Alan W. Irwin > > Astronomical research affiliation with Department of Physics and Astronomy, > University of Victoria (astrowww.phys.uvic.ca). > > Programming affiliations with the FreeEOS equation-of-state > implementation for stellar interiors (freeeos.sf.net); the Time > Ephemerides project (timeephem.sf.net); PLplot scientific plotting > software package (plplot.sf.net); the libLASi project > (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); > and the Linux Brochure Project (lbproject.sf.net). > __________________________ > > Linux-powered Science > __________________________ > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Plplot-devel mailing list > Plp...@li... > https://lists.sourceforge.net/lists/listinfo/plplot-devel > __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Phil R. <p.d...@gm...> - 2017-07-23 21:00:46
|
On 23 July 2017 at 12:27, Alan W. Irwin <ir...@be...> wrote: > Hi Phil: > > Here is further discussion of what you have said as opposed to the > error report I sent you in my previous post concerning your 3rd > iteration. > > Do keep in mind the possibility for the future that the memory > allocation part of your commits are worthwhile in their own right and > could be finished and landed on master much sooner than the rest of > your commits that are concerned with implementation of C exception > handling and use of that for handling our errors. I'm not sure how useful they really are without exception handling. Without such handling there is no location in the code where the array of pointers to the allocated memory is reviewed and in addition the best response to a failed allocation may not be a call to plexit (but I guess mostly it would be). > > I suggested in my big e-mail to you early this week (Mon, 17 Jul 2017 > 14:59:00 -0700 (PDT)) concerning your second iteration a method of doing > this using a list of plstream heap pointers. Please respond to that > suggestion. When you said a list I'm not sure I understood. Do you mean a linked list rather than an array? > > Also please respond to my suggestion (with appropriate code changes > while I handle the CMake part) in that e-mail concerning > > #define PL_IMPLEMENT_EXCEPTION_HANDLING and > #ifdef PL_IMPLEMENT_EXCEPTION_HANDLING > > On 2017-07-22 22:47+0100 Phil Rosenberg wrote: > >> [in that same e-mail] You asked why every API call required a PLTRY block. >> You are correct >> that actually they don't. Every API call that may result in a call to >> plexit needs a PLTRY block. > > > I initially didn't understand this explanation, but after _much_ too > long a time considering various scenarios to explain my point of view > the light finally dawned. Exception handling does not allow you to > transfer control to arbitrary places in the code (a stupid assumption > I was making). Instead, it merely allows you to unwind the call graph > from the plexit call (in this case that is the only current use of > PLTHROW in our source code) to return control to a routine potentially > far up the call stack from there such as _any_ of the public API > routines (as you said) that have a possible call to plexit somewhere > lower down on their call graph. Yes correct, we just unwind the stack to a certain point in the call sequence (or whatever terminology you choose as you mentioned in your ps and pps). This is good, because the user application is still sat waiting for our API function to return. We don't want to jump to some specific execution point, we need that function to return and for the user to find out somehow that there has been an error. > > So sorry for my previous noise on this sub-topic, but > now we are finally on the same page in this regard, I have some > further questions and comments about your implementation of > exception handling and how you are using that exception handling > to deal with bad errors. > > * If you miss putting a PLTRY and PLCATCH block in one of those public > API cases where a potential call to plexit is somewhere lower on the > call graph, then what does the current code do? Does it simply > crash (i.e., is that the potential cause of the serious issues I > have reported to you in my previous post) or do you handle that > internal error case by, e.g., emitting an "internal error: unhandled > exception" message followed by an exit? I think that currently something very odd. The value of pls->currentjumpbuffer used in PLTHROW would be either out of date or NULL, so probably some sort of crash. This is why it is important that every API entry function includes a PLTRY, PLCATCH, PLENDTRY sequence. Of course we could set things up to check for this scenario, but then what would we do? Call exit()? I had been trying to think if there was a C method to check at compile time, but I cannot think of one (there would be ways in C++ I think). But the API changes so infrequently that I think just ensuring all API functions include the PLTRY, PLCATCH, PLENDTRY code is probably okay - open to suggestions though. > > * What happens when an external application calls a "first" public API > which has another "second" public API lower down in its call graph > and even further down on the call graph of that second API occurs a > potential call to plexit? Then if plexit is called that will PLTHROW > control to the catch block of that second API rather than the first. > So for this particular call graph case for the first API this second > API needs to PLTHROW control to that first PLCATCH block rather than > simply returning. And, of course, since that second API is > sometimes called directly from external applications and libraries > without anything higher in the call graph with a PLCATCH block, it > cannot PLTHROW all the time without sometimes running > into the internal error discussed above. So how are you going to figure > out > when to PLTHROW in a public API PLCATCH block and when not to PLTHROW? > Or is that case already automatically taken care of by your present code > (e.g., by each PLTRY macro incrementing a library status flag > so that the PLCATCH macro can check that flag to see if there > is a PLCATCH block higher in the call graph that it should PLTHROW > to)? I had considered this ;-) We store a stack of jump buffers which increases in size by one every time we enter a new PLTRY block. Also the memory management marks this point with a NULL in the list of pointers to allocated memory. A PLTHROW unwinds to the first PLTRY/PLCATCH/PLENDTRY block it finds. At this point memory back to the previous NULL is released and the top jump buffer is removed from the stack. If there are any more jump buffers in the stack then part of the PLENDTRY code does another PLTHROW with the new top jump buffer of the stack. This repeats until there are no jump buffers left and we will be at the API function initially called by the user. At least that is the plan - I am yet to do the testing :-) > > * Shouldn't your current plexit call plend (like the master branch > version does) to shut down the library? See further discussion > below concerning this suggestion. My first thought is no - I would like the state of Plplot to be as it was before the API call was made - although I have just realised that this will not be the case, because the PLStreams are on the heap, not the stack, they won't be restored by the stack unwind. I will need to do this manually and probably do something like delay all memory frees until PLENDTRY so I can restore the pointers. > > * I also suggest you should implement a PLplot library state variable > (let's call it plplot_library_status) that is initialized to 0 (good > status) when the library is initialized but set to 1 (bad status) by > plexit. Then follow up by implementing a c_plstatus() public API > that returns the value of library_status and which emits a message > such as ("library had bad error so it had to be shut down. To use > it again you need to initialize the library by calling one of the > plinit family of routines".) Then an external application could call > c_plstatus to check on the status of the library after any call to > our public API and act accordingly. > > Or it could be sloppy about that (i.e., our standard C examples > might only call plstatus in one example to make sure it works). In > which case after the library is shutdown (see above) by an internal > call to plexit there would be a blizzard of further plabort (due to > plsc->level being 0) and plexit messages after the first one because > of that library shutdown. But there would be no call to the dreaded > exit routine so there is at least the chance that the external > application will survive that sloppiness. And if plexit does not > call plend, I think the chances of an external application > encountering a showstopper error are much worse. Hmm, this is quite interesting actually. As I said above I would like to not totally kill the PLStream - a good example could be that maybe plexit gets called due to a bad path for a shapefile for a map. This maybe could be fixed if the calling program checks for an error state. Of course people are lazy :-) so it might not get checked. In my initial thinking, the map plot would fail but all other plotting would carry on. maybe it would be better to set the PLStream to an error state and have all further API calls fail. Maybe the user could clear the error state to permit plotting to continue. I welcome thoughts on this. In terms of an error flag that you suggested. I am thinking perhaps all calls to plerr result in an error message being logged in the PLStream, plus when we reach the top level PLTRY/PLCATCH/PLENDTRY we call a user provided error callback. Then we have an API function to get the list of errors, another to clear the list of errors and one to check if there are any errors (which would just just check if the number of error messages in the log is greater than 0). So it is like your flag, but with some extra detail. > >> However, if we just have a policy that >> every API call must have a PLTRY block then it makes it easy to ensure >> nothing gets missed. But perhaps this is overkill. A number of >> functions in plcore.c, particularly the "get" type ones are very >> simple and would never call plexit. I'm happy to take advice on where >> the balance should lie? >> >> One other issue at the moment. I currently output a warning when >> plfree is called on a pointer that is not being managed by the >> PLStream. At present this will include all plfree calls made in >> plend/plend1 and can include memory allocated by the contour routines. >> The reason for this is due to the following chain of events >> 1) These bits of memory get allocated by plalloc in an API call >> somewhere (e.g cmap0). >> 2) On exit from the API call plfreeall gets called >> 3) plfreeall spots the allocated memory that wasn't freed and checks >> all the pointers that belong to the PLStream >> 4) It finds that the pointers belong to PLStream and need to not be >> freed. It therefore stops tracking them. >> 5) In plend1 plfree is called on the pointer. plfree cannot see the >> pointer in the list of tracked pointers so issues a warning. >> >> There are a few options for dealing with this, but I haven't decided >> which to use. For now my apologies for the list of warnings. >> > > I am frankly out of gas, and this is already a pretty long e-mail > so I will leave it to you to decide on the above topics unless > someone else here wants to comment. > > Anyhow, after I get some sleep, I am looking forward to your further > comments on (a) my error report for interation 3, and (b) what I have > had the energy to discuss above. Thanks for the thoughts and comments so far. They are very welcome. > > Alan > > __________________________ > Alan W. Irwin > > Astronomical research affiliation with Department of Physics and Astronomy, > University of Victoria (astrowww.phys.uvic.ca). > > Programming affiliations with the FreeEOS equation-of-state > implementation for stellar interiors (freeeos.sf.net); the Time > Ephemerides project (timeephem.sf.net); PLplot scientific plotting > software package (plplot.sf.net); the libLASi project > (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); > and the Linux Brochure Project (lbproject.sf.net). > __________________________ > > Linux-powered Science > __________________________ |
From: Alan W. I. <ir...@be...> - 2017-07-24 02:59:23
|
Hi Phil: On 2017-07-23 22:00+0100 Phil Rosenberg wrote: > On 23 July 2017 at 12:27, Alan W. Irwin <ir...@be...> wrote: >> I suggested in my big e-mail to you early this week (Mon, 17 Jul 2017 >> 14:59:00 -0700 (PDT)) concerning your second iteration a method of doing >> this using a list of plstream heap pointers. Please respond to that >> suggestion. > > When you said a list I'm not sure I understood. Do you mean a linked > list rather than an array? No, I was using list in a generic sense so it you store that information (and also access it) in an array form that would be fine with me. So basically the idea is whatever you are doing now in plfreeall for handling plstream heap pointers correctly should also be done in plmalloc, plcalloc, plrealloc, and plfree so we can replace all calls of malloc, calloc, realloc, and free in our code with calls to the pl* variants of those. > >> >> Also please respond to my suggestion (with appropriate code changes >> while I handle the CMake part) in that e-mail concerning >> >> #define PL_IMPLEMENT_EXCEPTION_HANDLING and >> #ifdef PL_IMPLEMENT_EXCEPTION_HANDLING >> Please go back to look at that original e-mail for this sub-topic, but to summarize what I meant is please implement this preprocessor instruction idea for iteration 4, and I will subsequently implement a CMake option to control whether PL_IMPLEMENT_EXCEPTION_HANDLING is #defined or not. >> On 2017-07-22 22:47+0100 Phil Rosenberg wrote: >> >>> [in that same e-mail] You asked why every API call required a PLTRY block. >>> You are correct >>> that actually they don't. Every API call that may result in a call to >>> plexit needs a PLTRY block. >> >> >> I initially didn't understand this explanation, but after _much_ too >> long a time considering various scenarios to explain my point of view >> the light finally dawned. Exception handling does not allow you to >> transfer control to arbitrary places in the code (a stupid assumption >> I was making). Instead, it merely allows you to unwind the call graph >> from the plexit call (in this case that is the only current use of >> PLTHROW in our source code) to return control to a routine potentially >> far up the call stack from there such as _any_ of the public API >> routines (as you said) that have a possible call to plexit somewhere >> lower down on their call graph. > > Yes correct, we just unwind the stack to a certain point in the call > sequence (or whatever terminology you choose as you mentioned in your > ps and pps). This is good, because the user application is still sat > waiting for our API function to return. We don't want to jump to some > specific execution point, we need that function to return and for the > user to find out somehow that there has been an error. Thanks for that confirmation. It is a big relief to me to finally move beyond my previous misconception. > >> >> So sorry for my previous noise on this sub-topic, but >> now we are finally on the same page in this regard, I have some >> further questions and comments about your implementation of >> exception handling and how you are using that exception handling >> to deal with bad errors. >> >> * If you miss putting a PLTRY and PLCATCH block in one of those public >> API cases where a potential call to plexit is somewhere lower on the >> call graph, then what does the current code do? Does it simply >> crash (i.e., is that the potential cause of the serious issues I >> have reported to you in my previous post) or do you handle that >> internal error case by, e.g., emitting an "internal error: unhandled >> exception" message followed by an exit? > > I think that currently something very odd. The value of > pls->currentjumpbuffer used in PLTHROW would be either out of date or > NULL, so probably some sort of crash. This is why it is important that > every API entry function includes a PLTRY, PLCATCH, PLENDTRY sequence. > Of course we could set things up to check for this scenario, but then > what would we do? Call exit()? I had been trying to think if there was > a C method to check at compile time, but I cannot think of one (there > would be ways in C++ I think). But the API changes so infrequently > that I think just ensuring all API functions include the PLTRY, > PLCATCH, PLENDTRY code is probably okay - open to suggestions though. Assuming we develop this proof-of-concept into the real deal and merge this topic to master (and so far I haven't seen any issues in this proof-of-concept that would preclude reaching that eventual result), there will be lots of functions where we need to add PLTRY and PLCATCH blocks. So given that situation where unhandled exceptions may be a fact of life for us for some time (and possibly any time in the future when new topics with e.g., additional externally accessible API are merged to master) I think we should handle this situation with a specific error message and calling exit rather than crashing with, e.g., a segfault. That conclusion is also based on my impression that is exactly how C++ deals with the unhandled exception issue. Furthermore, I think this is fundamentally a more satisfactory result than a crash whenever an unhandled exception occurs (especially if we copy this approach for our other non-plplot C libraries and our C device drivers, see further discussion below of this possibility). I mentioned "lots of functions" above. Note those will not just be limited to the official "c_*" API functions. Instead, our long-standing visibility work has determined all functions that are externally accessed in our tests, and we have effectively labelled all of those with the PLDLLIMPEXP visibility macro that is #defined in include/pldll.h.in. So that means identification of all the functions that need PLTRY/PLCATCH blocks has already been done for us. However, there are a maximum of 326 of them spread over 6 files as revealed by software@raven> grep -l 'PLDLLIMPEXP *' include/*.h* |grep -v pldll include/drivers.h include/ltdl_win32.h include/pdf.h include/plplot.h include/plplotP.h include/plstrm.h software@raven> grep 'PLDLLIMPEXP *' include/*.h* |grep -v pldll |wc -l 326 (Note, I have included drivers.h and pdf.h in the above list because in some configurations all driver code is absorbed into the core PLplot library.) I emphasize you don't have to be too concerned about inserting many more PLTRY/PLCATCH blocks for your proof-of-concept. But assuming the time comes in the relative near future when your C exception handling proof-of-concept has matured sufficiently so you at least want to merge it to the master branch as an option (see discussion above of the PL_IMPLEMENT_EXCEPTION_HANDLING macro) for PLplot users, then potentially there is a lot of editing work here with the prospect of dealing with typographical bugs in that work. Thus, would it be possible to deal with this situation automatically with appropriate macro to generate the code (consisting of a PLTRY block containing an appropriate call to the original renamed function, and empty PLCATCH block and all visibility issues taken care of)? However, if you don't think this automated approach is possible or it is too much work or too error prone, then you can count on me to help you with the required manual editing work for those 300+ functions when the time comes. >> * I also suggest you should implement a PLplot library state variable >> (let's call it plplot_library_status) that is initialized to 0 (good >> status) when the library is initialized but set to 1 (bad status) by >> plexit. Then follow up by implementing a c_plstatus() public API >> that returns the value of library_status and which emits a message >> such as ("library had bad error so it had to be shut down. To use >> it again you need to initialize the library by calling one of the >> plinit family of routines".) Then an external application could call >> c_plstatus to check on the status of the library after any call to >> our public API and act accordingly. >> >> Or it could be sloppy about that (i.e., our standard C examples >> might only call plstatus in one example to make sure it works). In >> which case after the library is shutdown (see above) by an internal >> call to plexit there would be a blizzard of further plabort (due to >> plsc->level being 0) and plexit messages after the first one because >> of that library shutdown. But there would be no call to the dreaded >> exit routine so there is at least the chance that the external >> application will survive that sloppiness. And if plexit does not >> call plend, I think the chances of an external application >> encountering a showstopper error are much worse. > > Hmm, this is quite interesting actually. As I said above I would like > to not totally kill the PLStream - a good example could be that maybe > plexit gets called due to a bad path for a shapefile for a map. This > maybe could be fixed if the calling program checks for an error state. > Of course people are lazy :-) so it might not get checked. In my > initial thinking, the map plot would fail but all other plotting would > carry on. maybe it would be better to set the PLStream to an error > state and have all further API calls fail. Maybe the user could clear > the error state to permit plotting to continue. I welcome thoughts on > this. > > In terms of an error flag that you suggested. I am thinking perhaps > all calls to plerr result in an error message being logged in the > PLStream, plus when we reach the top level PLTRY/PLCATCH/PLENDTRY we > call a user provided error callback. Then we have an API function to > get the list of errors, another to clear the list of errors and one to > check if there are any errors (which would just just check if the > number of error messages in the log is greater than 0). So it is like > your flag, but with some extra detail. I have now thought a bit more about this issue with propagating library error conditions to other C libraries and executables. Because of the #includes that are normally used by all such libraries that call the core PLplot library, these "other libraries and executables" have access to the same PLTRY, PLCATCH, and PLTHROW macros that the core PLplot library does. So I am thinking along the lines of all instances of PLplot calls occurring in our standard examples, e.g., pladv(0); being replaced by pladv(0); if(plstatus()) PLTHROW(NULL); The point is, this is a bit cleaner than pladv(0); if(plstatus()) { fprintf(stderr "There was an error in one of the PLplot library calls\n"); exit(1); } So I hope that first form using PLTHROW is possible in the standard examples. In any case, no matter how you decide to propagate errors across C library boundaries, I think our ultimate goal should be to use exception handling (with at least a PLTRY and PLCATCH block in each of all our C libraries and standard C executables and interface our C exception handling with native language exception handling or C++ device drivers, bindings, and standard examples, and for bindings and examples for all non-C++ languages (e.g., Ada, D, Python, OCaml) which support native exception handling. I do realize I am speculating quite a bit here, but I hope you take it in the intended spirit which is to generate discussion about the absolute best way to propagate library errors across library boundaries considering these future needs. To summarize, I think to help finish this proof-of-concept you need to * fix the bugs I found for iteration 3, * implement the "exit" idea for the case of unhandled exceptions to avoid crashing in that case, * implement the idea concerning the PL_IMPLEMENT_EXCEPTION_HANDLING macro, * finalize how you are going to propagate errors across library boundaries, and * demonstrate a completely unlazy version of one of the C examples which itself has a PLTRY block and PLCATCH block in its main routine and PLTHROWs associated with each PLplot call executed if plstatus() (or whatever you decide to replace it with) indicates a library error condition. And I need to * Test all your remaining iterations until this proof of concept is finished including devising some additional tests with a local patch (obviously not to be committed) to generate lots of calls to plexit through many different potential call stacks to test propagation of errors for most PLplot commands of most examples including the non-lazy one and all the lazy ones. * Implement a commit that removes trailing blanks if you don't do that yourself. * Implement a commit that adds a CMake option to control whether the PL_IMPLEMENT_EXCEPTION_HANDLING C macro is #defined or not. Once all tests are passed with the finished proof-of-concept then we can safely merge this to master (with PL_IMPLEMENT_EXCEPTION_HANDLING defaulting to undefined), and then afterward expand to full PLTRY coverage (automatically or manually implemented) for all externally visible PLplot functions. And at some point introduce completely consistent error handling for all our memory allocation calls whether they correspond to a plstream pointer or not. Once we are completely happy with the exception handling in libplplot and our C examples, we should turn the default option ON so PL_IMPLEMENT_EXCEPTION_HANDLING is #defined by default for most of our users, and as time permits extend exception handling to our other C libraries, our C and C++ device drivers and our bindings and examples for all supported languages that implement their own native exception handling. So it is going to be quite a lot of work to get from here to there. But this is a lofty goal (to eliminate exit completely from our entire code base). And I think the results as we gradually eliminate the exit call from our entire code base (except for internal errors due to exceptions which are not handled by accident). will be well worth it. Also, as we phase in each component of this work, that phase is not and all or nothing proposition since each component can be protected by its own non-default CMake option (just like I am proposing with the PL_IMPLEMENT_EXCEPTION_HANDLING macro) for the proof-of-concept once it is finished. So the work can be considerably spread out over a long time and remain on our master branch during all of that time. > Thanks for the thoughts and comments so far. They are very welcome. And the same goes for your thoughts and comments as well! Furthermore, since the lack of error handling has long been one of the most important fundamental PLplot library issues, it is great to see all your C exception-handler ideas fleshed out and implemented in your current proof-of-concept rather than us having a rather speculative discussion with absolutely no proof-of-concept code to help keep the discussion focussed. Best wishes, Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |