From: Alan W. I. <ir...@be...> - 2016-12-10 13:00:47
|
Hi Phil: If you review my results from much earlier this year that had a screen shot attached in the next e-mail of my cpumeter results, the test being run was for a case with absolutely no text data being written to wxPLViewer. So your concern (expressed off list to me) about the efficiency of IPC of text may be well taken, but they are not relevant to this particular result of huge idle times. Your other thought that inefficient IPC method may not be the direct cause of this issue is likely correct. For why, read on. One working hypothesis I could think of to generate these excessive idle times on Linux is that shared memory is not being shm_unlink'd properly so we have a shared memory leak that in a heavy-duty testing environment (such as when building the test_c_wxwidgets target) means the Linux kernel is soon right up against it (since it probably only allows a rather small proportion of actual memory to be used for the special purpose of shared memory). So the extraordinary measures that would be needed to supply shared memory ==> huge idle times. To investigate that possibility further I introduced (as of commit dcae24e) rudimentary but proper debug reporting of when the creation of shared memory is a success. I also did that for the shm_unlink call (but only for the case when there is success for PLMemoryMap::create followed by a call to PLMemoryMap::close). The results from building the test_c_wxwidgets target (before I ^C'd out of it) are as follows: Testing subset of C examples for device wxwidgets x01c PLMemoryMap::create: shm_open was a success for plplotMemoryMapIFWDQVRLBF PLMemoryMap::create: shm_open was a success for plplotMemoryMapIFWDQVRLBF PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapIFWDQVRLBF x04c PLMemoryMap::create: shm_open was a success for plplotMemoryMapBTDCASTECJ PLMemoryMap::create: shm_open was a success for plplotMemoryMapBTDCASTECJ PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapBTDCASTECJ PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory x08c PLMemoryMap::create: shm_open was a success for plplotMemoryMapYKNFFXNKGJ PLMemoryMap::create: shm_open was a success for plplotMemoryMapYKNFFXNKGJ PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapYKNFFXNKGJ PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory x14c PLMemoryMap::create: shm_open was a success for plplotMemoryMapNYBYKUHKXR PLMemoryMap::create: shm_open was a success for plplotMemoryMapNYBYKUHKXR PLMemoryMap::create: shm_open was a success for plplotMemoryMapIJIAMCSBUP PLMemoryMap::create: shm_open was a success for plplotMemoryMapIJIAMCSBUP PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapIJIAMCSBUP PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapNYBYKUHKXR x16c PLMemoryMap::create: shm_open was a success for plplotMemoryMapQPXDPGRVTF PLMemoryMap::create: shm_open was a success for plplotMemoryMapQPXDPGRVTF PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapQPXDPGRVTF PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory x17c ^C So these results completely shoot down that hypothesis because for every successful shm_open of a particular name there is also a successful shm_unlink for that same name ==> no shared memory leak! Oh well, on to the next idea. :-) For a bit I was excited about the above shm_unlink error messages that are also emitted for all but the first and 14th examples, but I now believe that pattern is the expected one, i.e., -dev wxwidgets finishes so it calls the PLMemoryMap destructor or PLMemoryMap::close directly which destroys the shared memory. Then wxPLViewer finishes so when it calls the PLMemoryMap destructor or PLMemoryMap::close directly those error messages must be emitted. But I frankly do not understand why no such error messages were emitted for the first and 14th examples. Yet for the first example experiments below, the error messages are always emitted. I have rerun both types of tests any number of times and these patterns always repeat. But I cannot figure out why different error messages are emitted for x01c results in the two contexts. Do you have some intelligent guess or even definitive response about that? Note the above form of results always take a very long time to create so the excessive idle time is still with us. In fact to see that directly, I ran x01c twice in a row as follows: software@raven> time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time examples/c/x01c -dev wxwidgets -np;echo "done x01c" PLplot library version: 5.11.1 PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZZQSYHRAO PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZZQSYHRAO PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapRZZQSYHRAO real 0m0.258s user 0m0.040s sys 0m0.016s done x01c PLplot library version: 5.11.1 PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory [~10 second pause] PLMemoryMap::create: shm_open was a success for plplotMemoryMapOLMHNIFOPT PLMemoryMap::create: shm_open was a success for plplotMemoryMapOLMHNIFOPT PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapOLMHNIFOPT real 0m10.263s user 0m0.052s sys 0m0.004s done x01c software@raven> PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory I then did the same experiment only using -dev psc for the second invocation of the example. software@raven> time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time examples/c/x01c -dev psc -o test.ps ;echo "done x01c" PLplot library version: 5.11.1 PLMemoryMap::create: shm_open was a success for plplotMemoryMapMAMYNDBJMT PLMemoryMap::create: shm_open was a success for plplotMemoryMapMAMYNDBJMT PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapMAMYNDBJMT real 0m0.255s user 0m0.052s sys 0m0.000s done x01c PLplot library version: 5.11.1 real 0m0.006s user 0m0.004s sys 0m0.000s done x01c software@raven> PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory So absolutely no delay of that second example run in this case (as expected), but a ~10 second delay (sigh) when -dev wxwidgets is used twice in a row. This ~10 second pause of the second example run for -dev wxwidgets occurred when no wxPLViewer GUI was showing. And from the placement of that pause it is now a certainty that the delay is due to a long wait (likely for some system resources) by the -dev wxwidgets code on startup, and the most likely candidate for that is that delay is occurring somewhere in PLMemoryMap::create. So further debug output is going to have to be inserted to confirm that, but if that turns out to be the case, this obviously has nothing to do with IPC methods but everything to do with setting those up. Note also this issue is exactly the one I posted about a long time ago, i.e., it also occurs when no text is being written to the device. Unfortunately, I have now run out of any further time to pursue this right now so the answer to the above question about whether or not the issue is somewhere in PLMemoryMap::create and if so finding the exact system call that is intermittently inserting huge delays will have to be a mystery until post-release (unless someone else steps in). I have now made a subsequent commit (1627eec) to drop this debug information for the release. But output of that useful debug information can trivially be restored by #defining PLMemoryMap_DEBUG in drivers/wxwidgets_comms.h. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: <p.d...@gm...> - 2016-12-10 13:41:11
|
Many thanks Alan for pushing this and getting the debug statements into the code. You have confirmed that i was looking in the wrong place for the problem. I was mulling things over this morning and wondered about the same thing as you, some sort of memory leak with the shared memory. You have obviously shown that not to be the case. The error messages are a bit of a surprise. I would have thought that the memory map would count the number of system wide open calls that had been made and only close when all those open calls was matched with a close. I will have to check the documentation to be sure. Maybe this is related to the long delay somehow. I think the next step is probably to add some timing code to the viewer. My plan is to try to sort the other bugs i mentioned this evening. Then if there is still time I may try that. If you do anything else in the meantime then let me know so we can avoid duplicated effort. Thanks again Phil Sent from my Windows 10 phone From: Alan W. Irwin Sent: 10 December 2016 13:00 To: Phil Rosenberg Cc: PLplot development list Subject: Re: More on IPC Hi Phil: If you review my results from much earlier this year that had a screen shot attached in the next e-mail of my cpumeter results, the test being run was for a case with absolutely no text data being written to wxPLViewer. So your concern (expressed off list to me) about the efficiency of IPC of text may be well taken, but they are not relevant to this particular result of huge idle times. Your other thought that inefficient IPC method may not be the direct cause of this issue is likely correct. For why, read on. One working hypothesis I could think of to generate these excessive idle times on Linux is that shared memory is not being shm_unlink'd properly so we have a shared memory leak that in a heavy-duty testing environment (such as when building the test_c_wxwidgets target) means the Linux kernel is soon right up against it (since it probably only allows a rather small proportion of actual memory to be used for the special purpose of shared memory). So the extraordinary measures that would be needed to supply shared memory ==> huge idle times. To investigate that possibility further I introduced (as of commit dcae24e) rudimentary but proper debug reporting of when the creation of shared memory is a success. I also did that for the shm_unlink call (but only for the case when there is success for PLMemoryMap::create followed by a call to PLMemoryMap::close). The results from building the test_c_wxwidgets target (before I ^C'd out of it) are as follows: Testing subset of C examples for device wxwidgets x01c PLMemoryMap::create: shm_open was a success for plplotMemoryMapIFWDQVRLBF PLMemoryMap::create: shm_open was a success for plplotMemoryMapIFWDQVRLBF PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapIFWDQVRLBF x04c PLMemoryMap::create: shm_open was a success for plplotMemoryMapBTDCASTECJ PLMemoryMap::create: shm_open was a success for plplotMemoryMapBTDCASTECJ PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapBTDCASTECJ PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory x08c PLMemoryMap::create: shm_open was a success for plplotMemoryMapYKNFFXNKGJ PLMemoryMap::create: shm_open was a success for plplotMemoryMapYKNFFXNKGJ PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapYKNFFXNKGJ PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory x14c PLMemoryMap::create: shm_open was a success for plplotMemoryMapNYBYKUHKXR PLMemoryMap::create: shm_open was a success for plplotMemoryMapNYBYKUHKXR PLMemoryMap::create: shm_open was a success for plplotMemoryMapIJIAMCSBUP PLMemoryMap::create: shm_open was a success for plplotMemoryMapIJIAMCSBUP PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapIJIAMCSBUP PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapNYBYKUHKXR x16c PLMemoryMap::create: shm_open was a success for plplotMemoryMapQPXDPGRVTF PLMemoryMap::create: shm_open was a success for plplotMemoryMapQPXDPGRVTF PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapQPXDPGRVTF PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory x17c ^C So these results completely shoot down that hypothesis because for every successful shm_open of a particular name there is also a successful shm_unlink for that same name ==> no shared memory leak! Oh well, on to the next idea. :-) For a bit I was excited about the above shm_unlink error messages that are also emitted for all but the first and 14th examples, but I now believe that pattern is the expected one, i.e., -dev wxwidgets finishes so it calls the PLMemoryMap destructor or PLMemoryMap::close directly which destroys the shared memory. Then wxPLViewer finishes so when it calls the PLMemoryMap destructor or PLMemoryMap::close directly those error messages must be emitted. But I frankly do not understand why no such error messages were emitted for the first and 14th examples. Yet for the first example experiments below, the error messages are always emitted. I have rerun both types of tests any number of times and these patterns always repeat. But I cannot figure out why different error messages are emitted for x01c results in the two contexts. Do you have some intelligent guess or even definitive response about that? Note the above form of results always take a very long time to create so the excessive idle time is still with us. In fact to see that directly, I ran x01c twice in a row as follows: software@raven> time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time examples/c/x01c -dev wxwidgets -np;echo "done x01c" PLplot library version: 5.11.1 PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZZQSYHRAO PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZZQSYHRAO PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapRZZQSYHRAO real 0m0.258s user 0m0.040s sys 0m0.016s done x01c PLplot library version: 5.11.1 PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory [~10 second pause] PLMemoryMap::create: shm_open was a success for plplotMemoryMapOLMHNIFOPT PLMemoryMap::create: shm_open was a success for plplotMemoryMapOLMHNIFOPT PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapOLMHNIFOPT real 0m10.263s user 0m0.052s sys 0m0.004s done x01c software@raven> PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory I then did the same experiment only using -dev psc for the second invocation of the example. software@raven> time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time examples/c/x01c -dev psc -o test.ps ;echo "done x01c" PLplot library version: 5.11.1 PLMemoryMap::create: shm_open was a success for plplotMemoryMapMAMYNDBJMT PLMemoryMap::create: shm_open was a success for plplotMemoryMapMAMYNDBJMT PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapMAMYNDBJMT real 0m0.255s user 0m0.052s sys 0m0.000s done x01c PLplot library version: 5.11.1 real 0m0.006s user 0m0.004s sys 0m0.000s done x01c software@raven> PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory So absolutely no delay of that second example run in this case (as expected), but a ~10 second delay (sigh) when -dev wxwidgets is used twice in a row. This ~10 second pause of the second example run for -dev wxwidgets occurred when no wxPLViewer GUI was showing. And from the placement of that pause it is now a certainty that the delay is due to a long wait (likely for some system resources) by the -dev wxwidgets code on startup, and the most likely candidate for that is that delay is occurring somewhere in PLMemoryMap::create. So further debug output is going to have to be inserted to confirm that, but if that turns out to be the case, this obviously has nothing to do with IPC methods but everything to do with setting those up. Note also this issue is exactly the one I posted about a long time ago, i.e., it also occurs when no text is being written to the device. Unfortunately, I have now run out of any further time to pursue this right now so the answer to the above question about whether or not the issue is somewhere in PLMemoryMap::create and if so finding the exact system call that is intermittently inserting huge delays will have to be a mystery until post-release (unless someone else steps in). I have now made a subsequent commit (1627eec) to drop this debug information for the release. But output of that useful debug information can trivially be restored by #defining PLMemoryMap_DEBUG in drivers/wxwidgets_comms.h. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2016-12-10 21:21:36
|
On 2016-12-10 13:41-0000 p.d...@gm... wrote: > Many thanks Alan for pushing this and getting the debug statements into the code. You have confirmed that i was looking in the wrong place for the problem. > > I was mulling things over this morning and wondered about the same thing as you, some sort of memory leak with the shared memory. You have obviously shown that not to be the case. > > The error messages are a bit of a surprise. I would have thought that the memory map would count the number of system wide open calls that had been made and only close when all those open calls was matched with a close. I will have to check the documentation to be sure. Maybe this is related to the long delay somehow. > > I think the next step is probably to add some timing code to the viewer. My plan is to try to sort the other bugs i mentioned this evening. Then if there is still time I may try that. > > If you do anything else in the meantime then let me know so we can avoid duplicated effort. Hi Phil: I was just going to leave this issue to you until post-release because there is lot of other stuff on my pre-release agenda, but if we could solve this issue for the release it would be great so I am perfectly willing to be flexible on the release timing if that will make a difference. Also after I got some much-needed sleep I woke up just now with huge curiosity about what is going on with the really obvious and reliably generated long pause for the second (!) time an example is executed if too close to the first execution. So instead of pursuing this with inserting timing statements like you are thinking about, instead I plan to simply insert more debug prints, e.g., "PLMemoryMap::create: entering", PLMemoryMap::create: calling close" in the PLMemoryMap::create and PLMemoryMap::close code. With the idea I do a binary search to find exactly which system call is causing that long pause. Of course, my initial result might be the long pause might not be in PLMemoryMap at all. So we will see how this goes, and I hope to have some results for you in an hour or so from now. Also, it strikes me that since Pedro is finding those segfaults on three different Linux systems for his own work, clearly there is something going on that is different for Linux compared to Windows, and it might be related to this long pause issue I am investigating. Meanwhile, I would appreciate it if you and everybody else here with access to wxwidgets and Linux tried my experiment on Linux of time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time examples/c/x01c -dev wxwidgets -np;echo "done x01c" just to make absolutely sure the reliable long pause (> 5 seconds and often > 10 seconds) I am getting for this experiment is not just some bug with a particular system call for the Debian Jessie version of that and instead is there for all Linux systems we have access to here. Laurent confirmed the long pause with a somewhat similar experiment a few weeks ago for a quite different Linux kernel (I have forgotten the distribution), but it would be good to get confirmation on a variety of Linux systems. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2016-12-10 23:07:57
|
On 2016-12-10 13:21-0800 Alan W. Irwin wrote: [...] > So instead of pursuing this with inserting timing statements like you > are thinking about, instead I plan to simply insert more debug prints, > e.g., "PLMemoryMap::create: entering", PLMemoryMap::create: calling > close" in the PLMemoryMap::create and PLMemoryMap::close code. With > the idea I do a binary search to find exactly which system call is > causing that long pause. Of course, my initial result might be the > long pause might not be in PLMemoryMap at all. So we will see how > this goes, and I hope to have some results for you in an hour or so > from now. Hi Phil: See commit 7756f60 for the tonne of new debug print statements I have inserted. With those I got what appears naively to me is a "deadlock" situation when attempting to delete m_name. (However, I emphasize naive because I don't really understand C++ that well so I may be missing something obvious). Anyhow, I hope you will have a fast look at these results and give me your comments on this "deadlock" and also my other conclusions assuming it is not a deadlock at all. First here are the results of the latest "two x01c example execution" test. software@raven> time examples/c/x01c -dev wxwidgets -np; echo "done x01c"; time examples/c/x01c -dev wxwidgets -np;echo "done x01c" PLplot library version: 5.11.1 [from -dev wxwidgets?] PLMemoryMap::close: just entering close PLMemoryMap::create: (mustNotExist) calling shm_open PLMemoryMap::create: shm_open was a success for plplotMemoryMapSNULFSEKQE PLMemoryMap::create: (mustNotExist) calling ftruncate PLMemoryMap::create: calling mmap PLMemoryMap::create: finished with call of mmap [from wxPLViewer?] PLMemoryMap::close: just entering close PLMemoryMap::create: (mustExist) calling shm_open PLMemoryMap::create: shm_open was a success for plplotMemoryMapSNULFSEKQE PLMemoryMap::create: calling mmap PLMemoryMap::create: finished with call of mmap [from -dev wxwidgets?] PLMemoryMap::close: just entering close PLMemoryMap::close: calling munmap PLMemoryMap::close: calling shm_unlink PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapSNULFSEKQE PLMemoryMap::close: start deleting m_name PLMemoryMap::close: end deleting m_name real 0m0.280s user 0m0.036s sys 0m0.020s done x01c PLplot library version: 5.11.1 [from wxPLViewer] PLMemoryMap::close: just entering close PLMemoryMap::close: calling munmap PLMemoryMap::close: calling shm_unlink PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory PLMemoryMap::close: start deleting m_name PLMemoryMap::close: end deleting m_name PLMemoryMap::close: just entering close [long pause] [from -dev wxwidgets?] PLMemoryMap::close: just entering close PLMemoryMap::create: (mustNotExist) calling shm_open PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZUTHPBSHB PLMemoryMap::create: (mustNotExist) calling ftruncate PLMemoryMap::create: calling mmap PLMemoryMap::create: finished with call of mmap [from wxPLViewer?] PLMemoryMap::close: just entering close PLMemoryMap::create: (mustExist) calling shm_open PLMemoryMap::create: shm_open was a success for plplotMemoryMapRZUTHPBSHB PLMemoryMap::create: calling mmap PLMemoryMap::create: finished with call of mmap [from -dev wxwidgets?] PLMemoryMap::close: just entering close PLMemoryMap::close: calling munmap PLMemoryMap::close: calling shm_unlink PLMemoryMap::close: shm_unlink was a success for plplotMemoryMapRZUTHPBSHB PLMemoryMap::close: start deleting m_name PLMemoryMap::close: end deleting m_name real 0m11.311s user 0m0.048s sys 0m0.008s done x01c [from wxPLViewer] software@raven> PLMemoryMap::close: just entering close PLMemoryMap::close: calling munmap PLMemoryMap::close: calling shm_unlink PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory PLMemoryMap::close: start deleting m_name PLMemoryMap::close: end deleting m_name PLMemoryMap::close: just entering close As you can see from the above results and the commit, I have inserted debug printout almost everywhere in PLMemoryMap::create and PLMemoryMap::close. Note I have tentatively (note question marks) identified which sections of output belong to -dev wxwidgets and which to wxPLViewer (except after the "done x01c" output it must be wxPLViewer so I have dropped the question marks in those cases). Do you agree with those identifications? Also note that several times we have PLMemoryMap::close: just entering close with nothing happening in close afterwards meaning your if statements are set up for that particular call to close in such a way that nothing is executed in close. Anyhow, the "deadlock" that concerns me is in the messages that must be purely from wxPLViewer where just before the long pause we have [from wxPLViewer] PLMemoryMap::close: just entering close PLMemoryMap::close: calling munmap PLMemoryMap::close: calling shm_unlink PLMemoryMap::close: shm_unlink error PLMemoryMap::close: No such file or directory PLMemoryMap::close: start deleting m_name PLMemoryMap::close: end deleting m_name PLMemoryMap::close: just entering close So it appears that wxPLViewer is executing its destructor, there is an error there because shm_unlink has already been called by -dev wxwidgets for that shared memory (but that is likely irrelevant, and near the end of that process it finishes deleting m_name, but then that destructor is called again afterwards. Doesn't that cause a problem or is that just naturally expected for C++ destructors? If that is expected, then that last message above corresponds to absolutely nothing happening in the (last) destructor. Which means the long pause is due to something else entirely than anything in PLMemoryMap::create or PLMemoryMap::close! Which would be some important progress in finding the cause of the long pause, but there is more work to do to nail this down. Anyhow, this time I really have pretty much come to the end of how much I can try additional print statements, but I could certainly act as a sounding board in the days ahead as you figure out the exact cause of the long pause. But you should note the long pause occurs in the -dev wxwidgets code (i.e. not PLViewer). The first debug output after that long pause is [from -dev wxwidgets?] PLMemoryMap::close: just entering close PLMemoryMap::create: (mustNotExist) calling shm_open where that PLMemoryMap::close: just entering close with nothing further happening in close is the result of the close() call at the start of PLMemoryMap::create, and the next printout occurs just before the next system call (to shm_open in this case). So these results are a clear indication the long pause is caused by something happening in the -dev wxwidgets code _before_ PLMemoryMap::create or PLMemoryMap::close are even called. So it is in that other code that you should insert debug print statements or timing commands. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2016-12-11 01:25:06
|
Hi (again) Phil: Two further thoughts. * All those preprocessing directives and fprintf calls I inserted make it extremely hard to understand the code. Thus, feel free to revert my two commits completely that inserted the debug output in PLMemoryMap:create and PLMemoryMap:close when you finally isolate exactly what system call is causing the long pause. But make that reverting commit a clean one (nothing else then exactly reverting my changes alone) to make it easy to reinstate my debug output again if we ever need it again. And I suggest you use a similar process for the debugging printout you insert (i.e., one clean commit to put in your own debug printout statements, one clean commit to remove that code again once you finally isolate the issue). * You will have to look at what your further debug print statements say to get the definitive answer, but my current guess is a call from -dev wxwidgets to some method of PLNamedMutex (or PLNamedMutexLocker) will turn out to be the cause of the long-pause issue. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |