From: <p.d...@gm...> - 2017-02-25 12:12:43
|
>But that failure (one or both semaphores are not blocked with zero >values) should be impossible because the fact is both semaphores are >initialized in that expected (blocked) state (that check succeeded for >the initial call to moveBytesReaderReversed on the wxPLViewer side), >and at the end of moveBytesReaderReversed when that header transfer >succeeded (as measured on the -dev wxwidgets side) the check is made >again that the semaphores are left in the proper blocked state. Welcome to the world of multithread bugs and race conditions 😊. I haven't looked at the code but my first guess should be some sort of race condition. You cannot rely on any operation completing in one process before the other, unless you have an explicit check for it. I seem to remember having an initialisation flag in the shared memory that gets set by wxPlViewer to indicate all initialisation is complete and the viewer is ready for communication to begin. If you don't have a similar check then you can’t rely on things being ready, including the semaphores being initialised. Note that “overtaking” can occur anywhere, including midway through a single line of code. Phil So I am pretty sure something must be clobbering the semaphores after the call to moveBytesReaderReversed is finished on the wxPLViewer side and before moveBytesWriter is called on the -dev wxwidgets side. But what? What severely complicates debugging this issue, is examples 9 and 16 have run flawlessly today after the first attempt that generated the above message. And the exact same generic sequence (transfer header from wxPLViewer to -dev wxwidgets and start tranferring data the other way with that call to moveBytesWriter) happens for every example with no occurrences of this error (at least so far). However, if something is getting clobbered (the only hypothesis that seems to make sense to me concerning the above results), then valgrind on -dev wxwidgets and/or wxPLViewer should be able to find what the trouble is. 2. I still have not implemented interactivity so I didn't bother to try example 1 with the -locate option. And for the same reason you should not expect example 20 to work (the only one of our examples that is interactive by default). 3. Examples 2 and 14 fail with wxPLViewer issuing the following message: Caught unhandled unknown exception; terminating This was the issue I incorrectly thought was a multipage issue yesterday, but it instead it is confined to just these two examples and must be due to some different way these examples set up plots that exposes an actual reproducible bug in the present -DPL_WXWIDGETS_IPC2=ON case which I am attempting to track down now. I cannot find this throw message anywhere in our own code, and indeed it instead appears (see <http://wxwidgets.10942.n7.nabble.com/Better-exception-handling-td87900.html>) that message is a generic wxwidgets response to uncaught exceptions. So my first step for this issue is to attempt to catch all exceptions in our code rather than passing them on, uncaught, to wxwidgets. In sum, the -DPL_WXWIDGETS_IPC2=ON has been largely matured with the only issues left being 1. an extremely elusive bug that ends up as an "impossible" "PLMemoryMap::moveBytesWriter: attempt to start transfer with semaphores not in correct blocked state." error message on rare and irreproducible occasions. 2. Interactivity not implemented. 3. Some easily reproduced bug exposed by example 2 and 14. I think 2 and 3 should be straightforward to deal with, and if my "something getting clobbered" hypothesis to explain 1 is correct, then with the help of valgrind that should be straightforward to solve as well. So I am extremely pleased that today's results showed so many of the examples are actually working fine now, and I am hoping to get all of them (and also the -locate option for example 1) working soon with -DPL_WXWIDGETS_IPC2=ON code. So stay tuned.... Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |