From: Alan W. I. <ir...@be...> - 2017-10-03 00:45:35
|
On 2017-10-02 21:22+0100 Phil Rosenberg wrote: > Hi Alan > I literally just logged onto my email to say stop whatever you are > doing on this the locate mode is broken!!! :-D Clearly, great minds think alike. :-) So indeed I plan to retain the old IPC code for quite a bit longer (likely to the start of the next release cycle in early 2018). > > I think there may be a couple of different issues here. One is that > the viewer is checking for more data being sent and the core code is > waiting for the location information so we have a form of deadlock. It is good that you are doing some critical thinking about my bytes transmission logic this way. However, it appears you are referring to the case where there are two separate but simultaneous uses of IPC (one to send an array of data from core to viewer and one to send an array of data from viewer to core). But that simultanously use case should be handled properly now. Here is why. The TransmitSemaphore (i.e., the m_tsem semaphore) is used to insure that the shared memory can only be used to transmit one (possibly large) array of bytes at any given time. Any attempt to try sending a second array while a first one is being transmitted just quietly pauses (see the early call to m_threeSemaphores.waitTransmitSemaphore() in PLMemoryMap::transmitBytes) until just before the return from the first complete array transmission (see the late call to m_threeSemaphores.postTransmitSemaphore() in PLMemoryMap::transmitBytes). Furthermore, once PLMemoryMap::transmitBytes on the transmitting side in tandem with PLMemoryMap::receiveBytes on the receiving side is allowed to actually send data by the TransmitSemaphore, they use the ReadSemaphore and the WriteSemaphore to make sure the entire transmission of all bytes in the array via the (possibly smaller) shared memory area is completed without deadlocks possible. However, if you can come up with a specific example where this logic could deadlock, more power to you, but instead of introducing timed waits, I would prefer to fix it if the deadlock is a bug in the implementation of the above design or else redesign the above protocol for the transmission of an array of bytes if that design is not correct. > The receiveBytes functions needs a timeout otherwise it will hang the > viewer if there is a problem at the other end. Possibly. With the new IPC3 approach I was admittedly on a mission to eliminate as many timed waits as possible just to see if I could reduce that number to zero. As a result, if the users kills the core with ctrl-C, the wxPLViewer continues and has to be independently killed. And the core hangs indefinitely in debug mode without a timeout until the user does the correct suggested invocation of wxPLViewer. But in my opinion these issues are simply convenience ones that we should deal with later after everything (e.g., locate mode) works properly with the new IPC approach. > Also with the 3sem > method it probably makes no sense to keep checking for data once we > have received a locate request. This is set by the m_locateMode > variable. OK. > I also noticed that you have #ifdefed out the keypress code. Please > don't remove this as it will need reinstating with the new method. I agree that should be implemented once the mouse click part of it (which is not #ifdefed out) works. So as stated above I plan to retain the old IPC code to act as a guide for the new IPC code for a while longer. In sum, I am hoping you indeed take this opportunity to look critically at transmitBytes and receiveBytes with the aid of my explanation to make sure you are satisfied they should "just work". Of course, if you can imagine a scenario where they won't work (i.e., there could be a deadlock issue) I will fix them. However, my current feeling is the real issue is not the details of how those routines are implemented, but instead some other issue in the way that locate mode is set up on the core side and/or viewer side for the new IPC method. For example, a detailed comparison of what happens in locate mode for the old IPC method versus the new one might reveal some critical logical difference between the two code paths? But I have looked at all of that and could not find anything wrong so I hope your fresh set of eyes, your much larger knowledge of exactly what triggers rendering on the viewer side (which I have never understood and which does not currently work properly for either IPC method), and/or your different debugging methods will help spot and fix whatever the issue might be. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |