From: Doug G. <dou...@gm...> - 2023-08-27 13:37:15
Attachments:
exampl1.txt
|
<html><head><meta http-equiv="content-type" content="text/html; charset=us-ascii"></head><body dir="auto"><div dir="ltr"><br><div dir="ltr">Christian,</div><div dir="ltr">(Resend - apologies for the previous direct send to you.)<br><div>Compiled LS with "-O3 -pg -g" flags. Discovered gprof does not work with c++ (c, fortran and asm only) wasted an afternoon, should have read the first line of the man page more carefully (doh).</div><div>Oprofile needs to be compiled from source for my Os but did find Sysprof available for quick install. I have attached a capture output summary that includes the failure mode I described previously. The summary zeros in on the heaviest loads.</div><div><br></div><div>I'm hoping for some advice. I will look at the code myself sometime this week.<br></div><div>Thanks again for your help,</div><div>Doug</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, 26 Aug 2023 at 16:55, Christian Schoenebeck <<a href="mailto:sch...@li...">sch...@li...</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Friday, August 25, 2023 3:14:13 PM CEST Doug Gray wrote:<br> > Christian,<br> > Reading and testing the example given here I can see the process.<br> > <a href="https://www.thegeekstuff.com/2012/08/gprof-tutorial/" rel="noreferrer" target="_blank">https://www.thegeekstuff.com/2012/08/gprof-tutorial/</a><br> > <br> > From your directions to Ebab in 2020 I surmise the command line would be:<br> > <br> > CXXFLAGS="-O3 -pg -j4" ./configure && make<br> <br> You should also add -g to add debug information.<br> <br> > and execute:<br> > <br> > ./src/linuxsampler<br> > <br> > I presume -O3 are optimisations, <br> <br> Correct<br> <br> > -pg is from the current man page for gprof, <br> <br> Yeah, but that's only for gprof. It injects extra profiling code directly into <br> the binary. Other profilers like oprofile AFAICR don't need or use that.<br> <br> > and -j4 to compile using 4 cores. Is this correct?<br> <br> Correct, and not important in this case. It just speeds up the compilation.<br> <br> > (I understand not to strip the executable.)<br> <br> Correct.<br> <br> It's been a long time that I used gprof or oprofile, so not sure whether the <br> situation has improved on gprof's end, but back then (years ago) gprof had the <br> huge disadvantage that it was only capable to profile on application binary <br> level only, whereas oprofile profiled from application binary level over <br> individual system libraries, down to lowest kernel level. So oprofile <br> delivered a much more complete and accurate picture than gprof.<br> <br> For instance in this case of this SFZ engine issue, it could also be possible <br> that the bottle neck is somewhere in libsndfile or any library that libsndfile <br> calls in turn. gprof would not have revealed performance issues in libsndfile <br> or any other system lib, as it simply did not count those in.<br> <br> The gig engine does not use any third-party lib during playback of samples. <br> The SFZ engine however supports a large number of all kinds of audio files. <br> That's why the SFZ engine calls libsndfile (also during real-time playback of <br> samples) to delegate support for all those file formats. And I'm not sure if <br> libsndfile and all the libs that libsndfile calls are designed to be real-time <br> safe.<br> <br> /Christian<br> <br> <br> <br> <br> _______________________________________________<br> Linuxsampler-devel mailing list<br> <a href="mailto:Lin...@li..." target="_blank">Lin...@li...</a><br> <a href="https://lists.sourceforge.net/lists/listinfo/linuxsampler-devel" rel="noreferrer" target="_blank">https://lists.sourceforge.net/lists/listinfo/linuxsampler-devel</a><br> </blockquote></div> </div></body></html> |
From: Christian S. <sch...@li...> - 2023-08-29 09:35:13
|
On Sunday, August 27, 2023 3:36:50 PM CEST Doug Gray wrote: > Christian, > (Resend - apologies for the previous direct send to you.) > > Compiled LS with "-O3 -pg -g" flags. Discovered gprof does not work with > c++ (c, fortran and asm only) wasted an afternoon, should have read the > first line of the man page more carefully (doh). I'm pretty sure gprof works with C++ frontend as well, but like I said I think gprof was a dead end anyway (for the reasons described before), so never mind. > Oprofile needs to be compiled from source for my Os Yeah, oprofile needs some work to run *and* providing useful output. > but did find Sysprof available for quick install. I have attached a capture > output summary that includes the failure mode I described previously. The > summary zeros in on the heaviest loads. Looking at your data, did you capture the entire app's life time, that is from app start over instrument loading up to the point where rendering caused the performance issue you reported? Because usually what you do is limiting eithering capture or afterwards analysis to the time period where the actual performance issue happens. In this case we are only interested in the period where playback happens and the audio dropouts occur. We don't want to have the time periods in the analysis data where the instrument was loaded, because that taints the picture. /Christian |
From: Doug G. <dou...@gm...> - 2023-08-30 10:49:45
|
Christian, Yes the output I posted began just before pressing sufficient keys to trigger the failure event, ie laying my arm across the keys. Capturing only the key presses, the failure itself and subsequent recovery, approximately 11 seconds in all. Fortunately sysprof tool allowed capturing an interval like this. Re gprof, as far as I can tell the code to generate the profile log is embedded in the host application and generates the profile output file only on a clean exit. It may not have worked for me since my Linuxsampler does not exit cleanly with ctrl-c (SIGINT). Using kill to stop LS the capture file probably would not get written, if indeed it does work with c++. FYI my LS does exit with ctrl-c after initialisation. After sending the lscp configuration file ctrl-c results in a ‘Stopping disk thread .. OK’ message and hangs. Not a big problem just not clean. Doug Sent from my iPad > On 29 Aug 2023, at 7:35 pm, Christian Schoenebeck <sch...@li...> wrote: > > On Sunday, August 27, 2023 3:36:50 PM CEST Doug Gray wrote: >> Christian, >> (Resend - apologies for the previous direct send to you.) >> >> Compiled LS with "-O3 -pg -g" flags. Discovered gprof does not work with >> c++ (c, fortran and asm only) wasted an afternoon, should have read the >> first line of the man page more carefully (doh). > > I'm pretty sure gprof works with C++ frontend as well, but like I said I think > gprof was a dead end anyway (for the reasons described before), so never mind. > >> Oprofile needs to be compiled from source for my Os > > Yeah, oprofile needs some work to run *and* providing useful output. > >> but did find Sysprof available for quick install. I have attached a capture >> output summary that includes the failure mode I described previously. The >> summary zeros in on the heaviest loads. > > Looking at your data, did you capture the entire app's life time, that is from > app start over instrument loading up to the point where rendering caused the > performance issue you reported? > > Because usually what you do is limiting eithering capture or afterwards > analysis to the time period where the actual performance issue happens. In > this case we are only interested in the period where playback happens and the > audio dropouts occur. > > We don't want to have the time periods in the analysis data where the > instrument was loaded, because that taints the picture. > > /Christian > > > > > _______________________________________________ > Linuxsampler-devel mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxsampler-devel |
From: Christian S. <sch...@li...> - 2023-09-09 10:11:37
|
On Wednesday, August 30, 2023 12:49:24 PM CEST Doug Gray wrote: > Christian, > Yes the output I posted began just before pressing sufficient keys to > trigger the failure event, ie laying my arm across the keys. Capturing > only the key presses, the failure itself and subsequent recovery, > approximately 11 seconds in all. Fortunately sysprof tool allowed > capturing an interval like this. OK, I was just confused because of the sfz::InstrumentResourceManager::SfzResourceManager::Create() method call in your output, which should only be called while loading an instrument, not during normal real-time playback. But I just realized that I misinterpreted the output: it is just there because Create() originally registered a Lambda Function by calling AddPeriodicJob(): http://svn.linuxsampler.org/cgi-bin/viewvc.cgi/linuxsampler/trunk/src/engines/sfz/InstrumentResourceManager.cpp?view=markup&pathrev=4019#l173 So it is that Lambda function registered by the AddPeriodicJob() call that is consuming those 10% CPU time. I am still not seeing any obvious cause for what you reported. But it is apparent that this registered Lambda function consumes more CPU time than it ought to be. So I would try testing by simpling commenting out that AddPeriodicJob() call and also by commenting out the following code block: 160 // perform periodic, custom jobs on behalf of external components 161 { 162 LockGuard lock(periodicJobsMutex); 163 for (ext_job_t job : periodicJobs) { 164 job.fn(); 165 } 166 } http://svn.linuxsampler.org/cgi-bin/viewvc.cgi/linuxsampler/trunk/src/engines/InstrumentManagerThread.cpp?view=markup&pathrev=4019#l160 All it does is periodically checking if the SFZ file was externally modified, and if yes, it would automatically reload the SFZ file to adapt playback according to those external SFZ file changes. Tha feature is for people creating new SFZ files or modifying existing ones. /Christian |
From: Doug G. <dou...@gm...> - 2023-09-11 05:31:56
Attachments:
exampl3.txt
exampl4.txt
|
Hi Christian, Thank you for looking at this. I commented out the code as suggested but this did not help other than removing the Lambda function as a candidate cause. BTW the sfz reload on change behaviour is very useful indeed. Attached are two profiles taken after making this change, exampl3 is about 15 seconds long spanning the onset and recovery of the fault event, the second exampl4 captured during the failure event itself (~8 seconds). The most curious aspect of this issue for me is the way only one cpu core hits 100% load during the event as if threads are not launching properly beyond some threshold and become locked preventing them being killed on say a keyoff event. Unfortunately this is just my hunch and so far I haven't seen anything to support it withing the code. Doug On Sat, 9 Sept 2023 at 20:11, Christian Schoenebeck < sch...@li...> wrote: > On Wednesday, August 30, 2023 12:49:24 PM CEST Doug Gray wrote: > > Christian, > > Yes the output I posted began just before pressing sufficient keys to > > trigger the failure event, ie laying my arm across the keys. Capturing > > only the key presses, the failure itself and subsequent recovery, > > approximately 11 seconds in all. Fortunately sysprof tool allowed > > capturing an interval like this. > > OK, I was just confused because of the > > sfz::InstrumentResourceManager::SfzResourceManager::Create() > > method call in your output, which should only be called while loading an > instrument, not during normal real-time playback. > > But I just realized that I misinterpreted the output: it is just there > because > Create() originally registered a Lambda Function by calling > AddPeriodicJob(): > > > http://svn.linuxsampler.org/cgi-bin/viewvc.cgi/linuxsampler/trunk/src/engines/sfz/InstrumentResourceManager.cpp?view=markup&pathrev=4019#l173 > > So it is that Lambda function registered by the AddPeriodicJob() call that > is > consuming those 10% CPU time. > > I am still not seeing any obvious cause for what you reported. But it is > apparent that this registered Lambda function consumes more CPU time than > it > ought to be. So I would try testing by simpling commenting out that > AddPeriodicJob() call and also by commenting out the following code block: > > 160 // perform periodic, custom jobs on behalf of external > components > 161 { > 162 LockGuard lock(periodicJobsMutex); > 163 for (ext_job_t job : periodicJobs) { > 164 job.fn(); > 165 } > 166 } > > > http://svn.linuxsampler.org/cgi-bin/viewvc.cgi/linuxsampler/trunk/src/engines/InstrumentManagerThread.cpp?view=markup&pathrev=4019#l160 > > All it does is periodically checking if the SFZ file was externally > modified, > and if yes, it would automatically reload the SFZ file to adapt playback > according to those external SFZ file changes. > > Tha feature is for people creating new SFZ files or modifying existing > ones. > > /Christian > > > > > _______________________________________________ > Linuxsampler-devel mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxsampler-devel > |
From: Christian S. <sch...@li...> - 2023-09-12 10:51:32
|
On Monday, September 11, 2023 7:31:34 AM CEST Doug Gray wrote: > Hi Christian, > Thank you for looking at this. > > I commented out the code as suggested but this did not help other than > removing the Lambda function as a candidate cause. Then it's a minor, unrelated issue. Should still be fixed, as it's apparently polling for SFZ file changes far too often. > BTW the sfz reload on > change behaviour is very useful indeed. > > Attached are two profiles taken after making this change, exampl3 is about > 15 seconds long spanning the onset and recovery of the fault event, the > second exampl4 captured during the failure event itself (~8 seconds). Looks like the heaviest part with 60% CPU is directly inside AbstractVoice::Synthesize(). Would make sense to profile which parts inside that specific method take how much of the CPU time. This method is shared with the gig engine, but some parts there are only used by the gig engine, and some parts only by the SFZ engine. > The most curious aspect of this issue for me is the way only one cpu core > hits 100% load during the event as if threads are not launching properly > beyond some threshold and become locked preventing them being killed on > say a keyoff event. Unfortunately this is just my hunch and so far I > haven't seen anything to support it withing the code. That's the expected behaviour. We don't have real SMP support. If you have exactly one audio output device instance, then there is exactly one audio thread doing all the heavy lifting of calculating the audio result for all voices. This behaviour applies to all formats, including SFZ and gig format. A workaround is creating additional audio output device instances with the sampler, then they could (depending on the audio driver) run in separate audio threads. But that's inconvenient for the user, as he would manually need to spread the setup over individual audio devices (threads). So I guess almost nobody uses that. We had discussions in the past many years ago with plans to implement real SMP support, that is automatically distributing voices to invidiual threads which in turn would deliver their result back to the main audio thread, however that never gained momentum. Simply because hardware development no longer made it necessary. With the gig engine you can run several hundred of voices with a single thread, without getting into any glitches - and that for like 12 years already if not even longer. /Christian |