From: FFADO <ffa...@ff...> - 2012-06-29 18:57:33
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by stefanr): I now tested my 2nd PC with the good RTC, unmodified r2175, Terratec PhaseX24FW (BeBoB), Agere FW323, kernel 3.4. Exactly the same results as with the 1st PC per comment:46: - "jackd -dfirewire -p256 -n3 -v3" = throws lots of "reconstructed CTR counter discrepancy" and "Execute: Timeout while waiting for activity", then dies soon. - "jackd -dfirewire -p256 -n4 -v3" = starts and typically plays, but ffado throws a neverending loop of "Execute: Timeout while waiting for activity" - "jackd -dfirewire -p256 -n5 -v3" = works fine, - "jackd -dfirewire -p512 -n3 -v3" = works fine. I do get exactly this result also after reboot and also with "nosmp" kernel parameter, i.e. CPU degraded to single core. IOW the issue is not SMP related. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:50> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-06-30 20:05:07
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by stefanr): Repeated testing on my first PC confirmed that [attachment:clock_monotonic.patch?format=raw clock_monotonic.patch] is absolutely necessary. Without it, I cannot start jackd due to negative timesteps, no matter what buffer sizes I set. I.e. the second sentence in the patch description has proven to be wrong: The patch is not just an improvement, rather it is indispensable. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:51> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-01 04:37:18
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by jwoithe): Replying to [comment:51 stefanr]: > Repeated testing on my first PC confirmed that [attachment:clock_monotonic.patch?format=raw clock_monotonic.patch] is absolutely necessary. Without it, I cannot start jackd due to negative timesteps, no matter what buffer sizes I set. Thanks for this latest round of testing. I am yet to get time to run the tests I eluded to in [comment:42 comment 42] but hope to soon. There definitely seems to be a driver dependency in this behaviour though, since I can run the FF800 with "-p1024 -n4" quite happily while for you this isn't possible. In any case, as per your earlier tests, using CLOCK_MONOTONIC isn't sufficient because it causes thing to go wrong with small buffer sizes. Likewise, on my system with the FF800 I get problems at lower buffer sizes with CLOCK_MONOTONIC_RAW or CLOCK_MONOTONIC that don't occur at all if CLOCK_REALTIME is used throughout. It seems to me that while CLOCK_MONOTONIC does something to improve things, there's still something wrong - something which happens to be covered over when larger buffers are in use. To be sure of all this I'll retest everything here and report back, at which case we can try to work out a way forward. -- Ticket URL: <http://subversion.ffado.org/index.fcgi/ticket/242#comment:52> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-01 09:13:48
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by stefanr): Replying to [comment:52 jwoithe]: > Replying to [comment:51 stefanr]: > > Repeated testing on my first PC confirmed that [attachment:clock_monotonic.patch?format=raw clock_monotonic.patch] is absolutely necessary. Without it, I cannot start jackd due to negative timesteps, no matter what buffer sizes I set. > > Thanks for this latest round of testing. I am yet to get time to run the tests I eluded to in [comment:42 comment 42] but hope to soon. There definitely seems to be a driver dependency in this behaviour though, since I can run the FF800 with "-p1024 -n4" quite happily while for you this isn't possible. Just to be clear, this is not dependent on drivers. I do get the same result like you with either CLOCK_MONOTONIC (r2174 + patch) or CLOCK_MONOTONIC_RAW/CLOCK_MONOTONIC mixture (plain r2174) on my 2nd PC with good RTC, FW323 controller, and DICE as well as BeBoB driver. For a time (but not anymore), I also got this result with plain r2174 like with r2174 + patch on my 1st PC with bad RTC, XIO2213/ JMB381/ FW643e controllers, and DICE as well as with BeBoB driver. (The result being that small buffers regressed relative to CLOCK_REALTIME whereas larger buffers work, here: beginning with -p256 -n5, or -p256 -n4 with warnings.) Only on my 1st PC with bad RTC and large uptime, the following difference between plain r2174 and r2174 + patch showed up: At first, the former survived only about 10 minutes while the latter could run for hours. Right now I can't start the former anymore but the latter is still reliable. Both with sufficiently large buffers. Again, my 2nd PC does not have this problem, or at least not yet. I don't know whether this special problem of my 1st PC is a result of the bad RTC or of the larger uptime or of both in combination. It is on the other hand quite safe to say that it is not the result of different drivers or controllers or whatever. I now moved my 2nd PC to another room and will have it gather several days uptime there such that the offset between CLOCK_MONOTONIC_RAW and CLOCK_MONOTONIC increases, then see whether the 2nd PC becomes affected the same way as the 1st PC. There are probably quicker and more definite ways to determine whether this is about the offset between the two clocks or about the speed difference between the two clocks, but I guess there is no rush. So far I did all my tests after ntpd already ran at least at boot. (It was subsequently switched off during some of the tests, but that made no difference except of course with CLOCK_REALTIME.) One test which I also should do eventually is to boot the 1st PC with bad RTC without ntpd ever started, so that CLOCK_MONOTONIC_RAW and CLOCK_MONOTONIC remain at the same offset and speed. > In any case, as per your earlier tests, using CLOCK_MONOTONIC isn't sufficient because it causes thing to go wrong with small buffer sizes. Yes. Maybe we need to instrument the clock_nanosleep calls somehow, to check that they indeed delay for about the period which is requested. -- Ticket URL: <http://subversion.ffado.org/index.fcgi/ticket/242#comment:53> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-06 13:26:08
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by jwoithe): I have made further progress I think. It turns out that we call sem_timedwait() in two places, and the absolute timeout value it takes is only checked against CLOCK_REALTIME - there is no way to change this. Revision r2177 checks in a fix for this issue: the time used to calculate the timeout value is obtained from CLOCK_REALTIME rather than Util::SystemTimeSource. Since the time obtained in the affected functions is only ever used with sem_timedwait() it is safe to do this. With the changes in r2177 I can successfully run the RME FF800 with "-p256 -n3" on my development machine whereas beforehand these settings were unstable. Stefan: I would be interested to see what r2177 (or later) does on your two boxes. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:54> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-08 14:53:19
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by stefanr): Replying to [comment:53 stefanr]: > I don't know whether this special problem of my 1st PC is a result of the bad RTC or of the larger uptime or of both in combination. It is on the other hand quite safe to say that it is not the result of different drivers or controllers or whatever. > > I now moved my 2nd PC to another room and will have it gather several days uptime there such that the offset between CLOCK_MONOTONIC_RAW and CLOCK_MONOTONIC increases, then see whether the 2nd PC becomes affected the same way as the 1st PC. I had my 2nd PC with good RTC collect 7 days uptime. This shifted CLOCK_MONOTONIC by +8_385_207 µs (+8.4 seconds) relative to CLOCK_MONOTONIC_RAW. This means that CLOCK_MONOTONIC is 13 ppm faster than CLOCK_MONOTONIC_RAW. During the entire 7 days I had unmodified FFADO r2175 running with "jackd -dfirewire -p256 -n5" without xrun. I stopped this session in order to try current svn. For comparison: My 1st PC with bad RTC currently has 42 days uptime and gathered 1_812_593_820 µs (30 minutes) difference of CLOCK_MONOTONIC - CLOCK_MONOTONIC_RAW. This amounts to CLOCK_MONOTONIC running 500 ppm faster than CLOCK_MONOTONIC_RAW. (Yet CLOCK_REALTIME still needs to be reset because that 500 ppm speed adjustment is not enough by far to keep up with the rest of the world.) Conclusion of the 7 days long experiment on my first PC: Several seconds absolute difference between CLOCk_MONOTONIC and CLOCK_MONOTONIC_RAW are not a problem for FFADO r2175. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:55> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-08 21:03:24
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by stefanr): Replying to [comment:54 jwoithe]: > I have made further progress I think. It turns out that we call sem_timedwait() in two places, and the absolute timeout value it takes is only checked against CLOCK_REALTIME - there is no way to change this. Revision r2177 checks in a fix for this issue: the time used to calculate the timeout value is obtained from CLOCK_REALTIME rather than Util::SystemTimeSource. Since the time obtained in the affected functions is only ever used with sem_timedwait() it is safe to do this. > But if a clock reset happened during sem_timedwait(), it will wait for too short or too long, right? > > With the changes in r2177 I can successfully run the RME FF800 with "-p256 -n3" on my development machine whereas beforehand these settings were unstable. > > Stefan: I would be interested to see what r2177 (or later) does on your two boxes. > I tested r2178 on both PCs now, together with what is to become libraw1394 v2.1.0. Results on my 2nd PC with good RTC: - -p256 -n3 and -p128 -n3 work fine. I did not test any other buffer settings yet. - DICE driver with Saffire PRO 24 as well as BeBoB driver with PhaseX24FW work alike. - CLOCK_REALTIME can be reset into the past and into the future without audible glitch and without xrun. - Unmodified r2178 as well as r2178 patched with clock_monotonic.patch work alike on this PC. Results on my 1st PC with bad RTC: - I was unable to start unmodified r2178 with BeBoB driver and PhaseX24FW. - I could start r2178 + clock_monotonic.patch with BeBoB driver and PhaseX24FW. I let it run for more than an hour and it survived many clock resets from me and several from ntpd during that time (with glitches, see below). - I could start unmodified r2178 with DICE driver and Saffire PRO 24 twice. It ran for a while and survived some resets of CLOCK_REALTIME (more on that below) but died when a long series of "reconstructed CTR counter discrepancy" happened after a clock reset. After this, I was no longer able to start unmodified r2178 with DICE driver and Saffire PRO 24. The driver startup could not recover from series of "CTR counter discrepancy". - I could start r2178 + clock_monotonic.patch with DICE driver and Saffire PRO 24. I let it run for more than an hour and it survived many clock resets from me and several from ntpd during that time (with glitches like the BeBoB driver, see below). - -p256 -n3, -p128 -n3, and -p64 -n3 work alike. - CLOCK_REALTIME can be reset into the past without glitch/ without xrun. - If CLOCK_REALTIME is reset into the future by seconds or by hours, it almost always causes a clearly audible gap in playback and a reported xrun if buffers are set to -256 -n3 or less. Further observations: - The xrun begins with one "(IsoHandlerManager.cpp)[ 292] Execute: Timeout while waiting for activity" warning, then a series of "(IsoHandlerManager.cpp)[1656] getPacket: reconstructed CTR counter discrepancy" and associated warnings follow, concluded by "(devicemanager.cpp)[ 998] waitForPeriod: XRUN detected" and another incrementFrameCounter and getPacket warning. - There was one occasion at which only one "Execute: Timeout while waiting for activity" was logged but unlike the other times, no CTR counter discrepancy/ xrun/ audible glitch happened. - The xrun at CLOCK_REALTIME reset into the future happens with unpatched r2178 (when it could started at all) as well as with r2178 + clock_monotonic.patch. - CLOCK_REALTIME can be reset into the past as well as into the future without glitch and without xrun with larger buffer sizes: -p256 -n5, or -p512 -n3, or -p2048 -n3 all prevent the glitch. To summarize: 1. All is fine on my 2nd PC with good RTC. libraw1394-git plus r2177 or later improve FFADO on that PC in one regard: FFADO is now immune against clock resets. While ntpd never resets the clock on this PC, user intervention could cause clock resets, or a switch between standard time and daylight saving time would cause it. 2. Results of libraw1394-git plus r2177 or later are mixed on my 1st PC with bad RTC: a. There is a regression relative to 2166 (or relative to r2177 with libraw1394 v2.0.9): FFADO is no longer able to start, or occasionally starts but dies soon after. This is fixed or at least successfully worked around by clock_monotonic.patch. b. Without or with clock_monotonic.patch, but only on my 1st PC with bad RTC, there is a handled xrun with audible gap when the system clock is reset into the future and buffers are -p256 -n3 or smaller. c. That xrun can escalate into an unhandled xrun and hence terminate jackd if clock_monotonic.patch was not applied. (But without clock_monotonic.patch, ffado is unlikely to work at all anyway.) d. This glitch does not happen if buffers are -p256 -n5 or bigger. e. This glitch does not happen if the clock is reset into the past. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:56> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-09 00:17:18
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by jwoithe): Replying to [comment:56 stefanr]: Thanks for this latest round of extensive tests and the detailed report. > Replying to [comment:54 jwoithe]: > > It turns out that we call sem_timedwait() in two places, and the absolute timeout value it takes is only checked against CLOCK_REALTIME - there is no way to change this. Revision r2177 checks in a fix for this issue: the time used to calculate the timeout value is obtained from CLOCK_REALTIME rather than Util::SystemTimeSource. Since the time obtained in the affected functions is only ever used with sem_timedwait() it is safe to do this. > > > > But if a clock reset happened during sem_timedwait(), it will wait for too short or too long, right? To be honest I'm not sure. I can't find any references which point to the behaviour of sem_timedwait() across a clock reset. I guess it depends on how it's implemented internally. If it sticks with the absolute time then I would expect spurious returns as you suggested. If instead it uses relative times it might work (clock_nanosleep() apparently delays correctly across time resets when working with a relative sleep interval). If sem_timedwait() happens to return early I don't think that would be a huge issue. The early wakeup will be noted but I think the rest of the streaming system would just determine that there was no work to do and send things back to sleep again. I note that in the relevant places there is no check/retry in place in the event the call was interrupted, which further leads me to believe that early wakeups are relatively harmless. An early wakeup would be caused by a forward time set. A late wakeup on the other hand (as caused by a time set into the past) could cause issues. I'm not sure there's much we can do about it though. The code is structured to use sem_timedwait(), and sem_timedwait() only ever uses CLOCK_REALTIME. Unlike pthread_cond_timedwait() there's no way to set the clock to anything else. Without a functional sem_timedwait() equivalent which can use a clock other than CLOCK_REALTIME I can't see any way to deal with this case. > Results on my 2nd PC with good RTC: > : Ok, that pretty much backs up the test results I had on my development PC. Progress of sorts has therefore been made. > Results on my 1st PC with bad RTC: > - I was unable to start unmodified r2178 with BeBoB driver and PhaseX24FW. That is strange. About the only thing left to be suspicious about is the speed difference between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW, but I can't immediately understand why things would be so sensitive to the actual clock speed. About the only thing I can think of is that the initial values of the DLL may just be too far out when there's a gross error in the clock speed. The initial values are calculated assuming sample rates observed relative to an ideal clock. No clock is ideal of course and that's what the DLL's operation is meant to correct for. But if the actual clock used is miles out (as yours is on this PC) perhaps the DLL can't stabilise quick enough. I'll have a look into this in the next day or so to see if this is even remotely feasible. > - I could start r2178 + clock_monotonic.patch with BeBoB driver and PhaseX24FW. I let it run for more than an hour and it survived many clock resets from me and several from ntpd during that time (with glitches, see below). This is puzzling. Aside from the speed difference (as discussed above) there should be no difference between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW, and yet we get this differing behaviour. Even if a switch to CLOCK_MONOTONIC is ultimately made I would like to try to get to the bottom of this because it may well crop up for others. For example, if one is running their audio PC isolated from the network there's no NTP and CLOCK_MONOTONIC will not be speed corrected - it would be indistinguishable from CLOCK_MONOTONIC_RAW in this case I think. Such a user, when facing a PC like yours, would still see the problems you're seeing with CLOCK_MONOTONIC_RAW even if FFADO switches to CLOCK_MONOTONIC. > - I could start unmodified r2178 with DICE driver and Saffire PRO 24 twice. ... After this, I was no longer able to start unmodified r2178 with DICE driver and Saffire PRO 24. The driver startup could not recover from series of "CTR counter discrepancy". That's odd too. Why could it start fine twice and then never again? And why was the DICE driver seemingly ok in this scenario while the BeBoB driver was apparently not. Perhaps you were just lucky with the DICE test; maybe the timer behaviour at that precise moment was just sufficiently "good" to permit FFADO startup. > - I could start r2178 + clock_monotonic.patch with DICE driver and Saffire PRO 24. ... > - If CLOCK_REALTIME is reset into the future by seconds or by hours, it almost always causes a clearly audible gap in playback and a reported xrun if buffers are set to -256 -n3 or less. Further observations: > - The xrun begins with one "(IsoHandlerManager.cpp)[ 292] Execute: Timeout while waiting for activity" warning, then a series of "(IsoHandlerManager.cpp)[1656] getPacket: reconstructed CTR counter discrepancy" and associated warnings follow, concluded by "(devicemanager.cpp)[ 998] waitForPeriod: XRUN detected" and another incrementFrameCounter and getPacket warning. The "Timeout while waiting for activity" warning comes about as a result of an ETIMEDOUT errno from sem_timedwait(). If the time reset occurred between the clock_gettime() and the sem_timedwait() call I would expect this: the constructed timeout time would be well and truly passed by the time sem_timedwait() was called, so it would immediately time out. FFADO prints the warning and we move on. It's those "reconstructed CTR counter discrepancy" warnings that I don't immediately have an answer for. They usually indicate a loss of sync between the computer and the device, but it's not immediately clear to me why they would be prompted by an early return from sem_timedwait() in this context. Hmm, unless the streaming code is being iterated over unconditionally after the early sem_timedwait() return; then it's possible that assumptions within the streaming code about the expected changes in clock values may play a part. In any case, an xrun is usually the ultimate result of a series of "reconstructed CTR counter discrepancy" messages, so the xrun is probably just a result of collateral damage. > - There was one occasion at which only one "Execute: Timeout while waiting for activity" was logged but unlike the other times, no CTR counter discrepancy/ xrun/ audible glitch happened. Interesting. > - The xrun at CLOCK_REALTIME reset into the future happens with unpatched r2178 (when it could started at all) as well as with r2178 + clock_monotonic.patch. Ok, that's a good observation since it means that the series of events which lead to the xrun are not really affected by CLOCK_MONOTONIC or CLOCK_MONOTONIC_RAW. It more or less confirms that the premature return from sem_timedwait() is to blame. Another thought I will try to pursue: when the time is reset (either by ntp or manually), how long does it take for this to become "active", and during the transition what's the behaviour of clock_gettime(CLOCK_REALTIME, ...)? The reason I bring this up is that if these problems are traceable to sem_timedwait() then they would only occur if the clock reset occurred between the clock_gettime() and sem_timedwait() call. While clearly of finite probability the chances are rather small, and yet the problems we're chasing here are observed far more often than not. I'm wondering whether there's some other mechanism at play here. One thought I had was that maybe clock_gettime() was blocking for a while while the time reset was being done, but the problem with this hypothosis is that the same blocking behaviour would be present on your second "good RTC" PC. > To summarize: > > 1. All is fine on my 2nd PC with good RTC. libraw1394-git plus r2177 or later improve FFADO on that PC in one regard: FFADO is now immune against clock resets. While ntpd never resets the clock on this PC, user intervention could cause clock resets, or a switch between standard time and daylight saving time would cause it. This on the whole is good because clearly it will cover a vast majority of the systems out there. However, I would still very much like to get to the bottom of the continued mis-behaviour of your "bad RTC" PC. If nothing else, solving this will, I expect, make FFADO more robust. Thoughts and feedback about any of the above is more than welcome. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:57> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-09 05:47:17
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by stefanr): Replying to [comment:57 jwoithe]: > Replying to [comment:56 stefanr]: > > - I could start r2178 + clock_monotonic.patch with BeBoB driver and PhaseX24FW. I let it run for more than an hour and it survived many clock resets from me and several from ntpd during that time (with glitches, see below). > > This is puzzling. Aside from the speed difference (as discussed above) there should be no difference between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW, and yet we get this differing behaviour. Even if a switch to CLOCK_MONOTONIC is ultimately made I would like to try to get to the bottom of this because it may well crop up for others. For example, if one is running their audio PC isolated from the network there's no NTP and CLOCK_MONOTONIC will not be speed corrected - it would be indistinguishable from CLOCK_MONOTONIC_RAW in this case I think. Such a user, when facing a PC like yours, would still see the problems you're seeing with CLOCK_MONOTONIC_RAW even if FFADO switches to CLOCK_MONOTONIC. I should remove ntpd from the system start config, then reboot --- thus get a session with identical speed of both monotonic clocks --- and try without and with patch. > > - I could start unmodified r2178 with DICE driver and Saffire PRO 24 twice. ... After this, I was no longer able to start unmodified r2178 with DICE driver and Saffire PRO 24. The driver startup could not recover from series of "CTR counter discrepancy". > > That's odd too. Why could it start fine twice and then never again? And why was the DICE driver seemingly ok in this scenario while the BeBoB driver was apparently not. Perhaps you were just lucky with the DICE test; maybe the timer behaviour at that precise moment was just sufficiently "good" to permit FFADO startup. This aspect is unchanged since r2174. I didn't take notes about it, but I suspect the few successful startups without patch (all which terminated with unrecoverable xrun after less than half an hour) were all with the DICE driver. But even so, the number of these occasions was too small to conclude whether DICE and BeBeB driver or streams differ in this regard. My gut feeling is they don't differ. When unmodified r2178 fails immediately, it looks just like in attachment :jackd_r2174_new-libraw1394.txt. > > - I could start r2178 + clock_monotonic.patch with DICE driver and Saffire PRO 24. ... > > - If CLOCK_REALTIME is reset into the future by seconds or by hours, it almost always causes a clearly audible gap in playback and a reported xrun if buffers are set to -256 -n3 or less. Further observations: > > - The xrun begins with one "(IsoHandlerManager.cpp)[ 292] Execute: Timeout while waiting for activity" warning, then a series of "(IsoHandlerManager.cpp)[1656] getPacket: reconstructed CTR counter discrepancy" and associated warnings follow, concluded by "(devicemanager.cpp)[ 998] waitForPeriod: XRUN detected" and another incrementFrameCounter and getPacket warning. Since yesterday evening, I let jackd continue running. 2nd PC, BeBoB, -p128 -n3: running without xrun for 11 hours now 1st PC (bad RTC), DICE, monotonic_clock.patch applied, -p64 -n3: running for 9 hours now, clock resets have presumably happened in periods of a little bit more than a quarter hour (36 times), qjackctl shows only 30 xruns. I.e. it seems to be a race condition with about 80...85% probability of occurrence (on this PC). {{{ 22:22:50.002 XRUN callback (1). /* one reset without xrun */ 22:54:24.831 XRUN callback (2). 23:10:02.788 XRUN callback (3). 23:25:49.750 XRUN callback (4). 23:41:34.650 XRUN callback (5). 23:57:14.556 XRUN callback (6). 00:12:51.511 XRUN callback (7). 00:28:29.398 XRUN callback (8). 00:44:16.350 XRUN callback (9). /* one reset without xrun */ 01:15:43.198 XRUN callback (10). 01:31:27.102 XRUN callback (11). 01:47:14.086 XRUN callback (12). 02:02:56.968 XRUN callback (13). 02:18:39.910 XRUN callback (14). /* one reset without xrun */ 02:50:09.779 XRUN callback (15). 03:05:51.708 XRUN callback (16). /* two resets without xrun */ 03:52:57.463 XRUN callback (17). 04:08:38.374 XRUN callback (18). 04:24:21.339 XRUN callback (19). 04:40:01.265 XRUN callback (20). 04:55:41.153 XRUN callback (21). 05:11:24.088 XRUN callback (22). 05:27:07.051 XRUN callback (23). 05:42:53.960 XRUN callback (24). 05:58:30.882 XRUN callback (25). 06:14:11.829 XRUN callback (26). 06:29:57.762 XRUN callback (27). /* one reset without xrun */ 07:01:26.608 XRUN callback (28). 07:17:07.530 XRUN callback (29). 07:32:52.489 XRUN callback (30). }}} Likewise, manual clock resets into the future which I tried yesterday did almost always (but sometimes not) cause an xrun on the PC with bad RTC. I did not count, hence don't have a probability. On the other PC with good RTC, I tried about a dozen manual clock resets into the future (interleaved with about the same number of clock resets into the past), of which none caused an xrun. So the race condition does either not exist on the good PC or is at least far less probable than on the bad PC. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:58> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-16 12:28:54
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by jwoithe): Thanks again for the testing. I'm a little unsure of how to proceed from here. As far as I can tell, unmodified r2178 (and presumedly later) works fine on the PC with a good RTC - right? Then on the bad PC, unmodified r2178 fails to start while the modified variant starts but still gives xruns and the like. Please correct where I've misunderstood. Assuming the above is vaguely accurate it does mirror what I'm seeing here. On my test PC (which has a reasonable RTC) I am seeing no practical difference between occasions where CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW are used. This doesn't surprise me because as far as I can tell both clock sources should be more than capable of supporting FFADO start up. My understanding of these two sources is that for FFADO CLOCK_MONOTONIC_RAW is probably the appropriate one to use. In any case, I am puzzled why with a bad RTC there is such a dramatic difference between the two. My present theory is that perhaps CLOCK_MONOTONIC_RAW - being uncorrected in every way - may be so far off that the values used to initialise the DLLs result in the DLLs not locking. I would have expected them to eventually, but maybe we don't wait long enough for the cases where there's a dramatic difference between theory and practice. Or perhaps it doesn't have a regular period due to some hardware problem (which is possibly related to the inaccurate RTC timekeeping). The start up fail log you attached produces a lot of negative time steps in the cycle timer helper. I find the existence of these interesting but don't know what to make of it yet. However, I do suspect that these negative time steps may be at least partly to blame for the "reconstructed CTR counter discrepancy" messages you get in abundance later on. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:59> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-17 06:44:57
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by jwoithe): stefan: I've done some more thinking about all this. Regarding the difference between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW, I think I've identified one possible way where the behaviour could be different on your "bad-RTC" PC. On this PC, CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW clearly do run at different rates, and the difference is significant. In FFADO's timing code, a request to sleep when CLOCK_MONOTONIC_RAW is in use will in fact use CLOCK_MONOTONIC (see [comment:20 comment 20]). For most systems the speed difference between the two clocks will be very small, so the distinction is largely irrelevant. However, on systems like your bad-RTC PC, I can imagine that the speed differences could cause some issues. I'm not convinced it's the sole reason for the negative time steps we've noted in the logs, but it could at least have something to do with the overall stability issues and explain the apparent differences in behaviour we see when the different clocks are used. In light of this I intend to run some more tests myself in the next few days. Hopefully I'll have a fresh revision to test in reasonably short order. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:60> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-17 13:06:25
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by jwoithe): Further to [comment:60 comment 60], I've checked in r2186. In light of the realisation that rate differences between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW could cause inconsistent sleeps when times are taken from the latter but the former is used for sleep timing, we'll try using CLOCK_MONOTONIC for everything. The ABS_TIMER mode is also utilised where appropriate if flagged as available at compile time. Stefan: based on tests here and earlier ones done by yourself I expect that this change should permit correct operation on your "good-RTC" PC. The behaviour on the "bad-RTC" PC is unlikely to be different to that which you detailed in [comment:58 comment 58]. However I think we're approaching a solution which may allow this ticket to be closed off. Regarding the xruns flagged on the bad-RTC PC, I am beginning to think that we've done almost all we can do to deal with PCs whose RTCs are so badly out of whack. While you're still getting xruns flagged, it seems that FFADO is at least able to continue now. As to why these xruns are triggered I'm pretty much out of fresh ideas. A race condition during the setting of the clock (as you suggested) seems to be the most likely candidate at the moment, and if that's the case there's probably not a whole lot more FFADO can do. As always, thoughts and test results are welcome. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:61> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-18 14:04:14
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by stefanr): I have r2186 up and running for 18 hours now. 1st PC with bad RTC, "jackd -dfirewire -p128 -n3 -v3", DICE driver: - starts and runs stable - backwards jumps of CLOCK_REALTIME (by date -s or the likes): no xrun, no audible glitch - forwards jumps of CLOCK_REALTIME: cause handled xrun with audible gap with a probability of circa 90% 2nd PC with good RTC, "jackd -dfirewire -p128 -n3 -v3", BeBoB driver: - starts and runs stable - backwards or forwards jumps of CLOCK_REALTIME: no xrun, no audible glitch (Note to self: Have yet to test the PC with bad RTC booted without ntpd, so that CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW are theoretically identical, and then with ntpd so that the former gets faster than the latter but still hs got only an offset in the order of seconds rather than minutes. But I guess these remaining handled xruns are more likely connected to the speed difference rather than the offset between the monotonic clocks. Even if this will be confirmed, it still wouldn't quite explain why the clock jump related race only affects the bad PC, not the good PC.) Jonathan, thanks much for your work on this. Folks with sane hardware can now set their clock without crashing jackd. Folks with bad hardware like mine now don't get mysterious unhandled xruns = crashes anymore (whose connection to ntpd would not be obvious because it seemed to become standard for ntpd to do clock resets without syslog message, unlike in earlier times when I noticed the problem). Instead, there are now handled xruns in regular intervals (here: periods of a bit more than 15 minutes), which at least point users' or supporters' suspicion toward system services like ntpd, so that the respective workaround to disable the service is more likely to be found. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:62> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-19 03:11:51
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by jwoithe): Thanks for this latest round of testing. I think we're converging on a solution. Note that the current code is using CLOCK_MONOTONIC for everything at present. The relative speeds of CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW are therefore not important because the latter isn't being used. Your suggested test would still be worth doing but only if ieee1394service was hacked to use CLOCK_MONOTONIC_RAW for timekeeping. This test would then give us an indication as tp whether the stability issues with CLOCK_MONOTONIC_RAW in use come about due to clock speed differences (this being the current theory). At present I can't quite see where the remaining xruns are coming from. With CLOCK_MONOTONIC in use everywhere they can't be blamed on clock speed differences, and as you said the issue is only occuring on the PC with the bad RTC. About the only thing I can think of is that the two systems (the "good-RTC" and "bad-RTC" PCs) are using different clock sources at the kernel level (for example: one using TSC and one using hpet), and that the two systems therefore respond slightly differently to forward time jumps. With these caveats I'm inclined to suggest that this ticket can be closed as "fixed". Does anyone see a need for it to remain open now? -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:63> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-19 18:02:49
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: new Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by stefanr): Replying to [comment:63 jwoithe]: > Note that the current code is using CLOCK_MONOTONIC for everything at present. Yep. > The relative speeds of CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW are therefore > not important because the latter isn't being used. Well, I thought it ''shouldn't'' be important because it ''shouldn't'' be used. :-) > At present I can't quite see where the remaining xruns are coming from. > With CLOCK_MONOTONIC in use everywhere they can't be blamed on clock speed > differences, and as you said the issue is only occuring on the PC with the > bad RTC. About the only thing I can think of is that the two systems (the "good-RTC" and "bad-RTC" PCs) are using different clock sources at the kernel > level (for example: one using TSC and one using hpet), and that the two > systems therefore respond slightly differently to forward time jumps. O, right. I forgot that there may be quite a few other factors. Regarding clock sources: {{{ # bad PC /sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm /sys/devices/system/clocksource/clocksource0/current_clocksource:tsc # good PC /sys/devices/system/clocksource/clocksource0/available_clocksource:hpet acpi_pm /sys/devices/system/clocksource/clocksource0/current_clocksource:hpet }}} So I should test the bad PC with hpet as clock source. > With these caveats I'm inclined to suggest that this ticket can be closed as > "fixed". I for one agree. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:64> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-07-20 06:25:29
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: closed Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: fixed | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Changes (by jwoithe): * status: new => closed * resolution: => fixed Comment: Thanks for the exhaustive rounds of testing Stefan. For the purposes of bug tracking I'll close this as "fixed" but comments can still be added after closure. It would certainly be interesting to know whether changing the clock source on the "bad" PC has any effect on the remaining xruns (or on any other aspects of the clocks operation). -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:65> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |
From: FFADO <ffa...@ff...> - 2012-09-10 18:02:44
|
#242: ntp interferes with ffado ------------------------+--------------------------------------------------- Reporter: ppalmers | Owner: ppalmers Type: bug | Status: closed Priority: major | Milestone: FFADO 2.x Component: | Version: FFADO 2.0-rc2 (1.999.42) Resolution: fixed | Keywords: Device_name: all | ------------------------+--------------------------------------------------- Comment (by stefanr): A minor update: As you may remember, my "bad" PC went from unrecoverable xrun at clock reset to recoverable xrun at clock reset into the future (and no xrun at clock reset into the past). This was observed while the bad PC had already gathered an uptime in the order of multiple weeks. Meanwhile the PC was power-cycled once and rebooted a few times since then. Now the bad PC behaves like intended and like my "good" PC behaves: No xrun, neither at clock resets into the future nor into the past. -- Ticket URL: <http://subversion.ffado.org/ticket/242#comment:66> FFADO <http://subversion.ffado.org/index.fcgi> Free Firewire Audio Drivers |