From: Jonathan W. <jw...@ju...> - 2013-08-22 01:04:08
|
On Wed, Aug 21, 2013 at 04:48:29PM +0200, Stefan Richter wrote: > On Aug 19 Jonathan Woithe wrote: > > On Sun, Aug 18, 2013 at 09:18:37PM +0400, ???????? wrote: > [...] > > > Here is one of outputs with revision 2374: http://pastebin.com/NVdLzb2d > > > > Thanks for making this available. I note that this is actually for r2376 > > (see line 12) so that means you don't need to provide me with another one at > > this stage. > > > > From this we can confirm several things. Firstly, as you said, the hardware > > sample rate does appear to be in sync with what jackd expects now, so the > > logic changes in r2376 have done their job. FFADO is also able to extract > > some realistic timestamps out of the data sent by the FF800. For example, > > at line 946 we see that FFADO has estimated the actual rate (relative to the > > firewire clock) to be 48114 Hz, which is about what I'd expect. A few lines > > later (at line 957) we see something very interesting: > > > > > Warning (IsoHandlerManager.cpp)[1701] getPacket: > > > (0xb6ddd0) dropped < 1 (-2), cycle: 2622, last_cycle: 2623, dropped: 0, > > > skipped: 0 > > > Debug (StreamProcessor.cpp)[ 641] getPacket: dropped packets xrun > > > (4294967294) > > > > What makes this interesting is that it appears that the cycle timer has gone > > backwards - something which should not happen. The problem is noted in the > > transmit stream - not the receive - which means it has nothing to do with > > the data coming from the FF800 (or if it does I can't see how). Instead, > > the system has requested audio data for cycle 2623 and it has been > > delivered. Some time later the system requests the next block of audio > > data, but the cycle number in the request is now 2622. > > > > We have seen this once before but I don't think we were able to resolve the > > cause (from memory the reporter changed something hardware related and the > > problem went away). > > > > Was anything interesting reported at the end of "dmesg" at around the same > > time that you saw this? > > > > Stefan: here's that crazy backwards-moving cycle timer thing again. Any > > ideas? As noted above it seems to be associated with the iso transmit side > > of things: the kernel is requesting data for cycle 2623 before it asks for > > 2622. > > Actually the kernel simply reports the cycle stamps which the OHCI put into > the transmit DMA completion events (by means of a status return field in the > DMA program, see the OHCI specification, clause 9.1.4 and 9.1.5.) > > Either the controller writes bogus data there. > > Or the kernel drivers have a bug which brings the IT DMA events out of order. > (That's hard for me to believe, but I am biased.) > > Or the userland stack (libraw1394, FFADO) has a bug which brings the IT DMA > out of order. I agree with these suggestions. However, from my (albeit brief reading of the source) I didn't think there was any scope for userland (libraw1394 or FFADO) to change the order of the IT callbacks (which is essentially what it would appear like in userspace). FFADO runs its IT code when libraw1394 calls the callback FFADO has configured, and libraw1394 is basically just passing the kernel requests straight through whenever they come up. > Is FFADO accessing one and the same raw1394handle_t instance from more than > a single thread? That's a good point. I don't believe so but I will check. As far as I know the transmit and receive threads each call raw1394_new_handle() to obtain a handle for their own exclusive use. A quick grep through the FFADO source code (mostly in libieee1394) seems to confirm this (see IsoHandlerManager.cpp, IsoHandlerManager::IsoHandler::enable()). > Seems we need a test tool for this phenomenon, similar to the test tool > which repeatedly reads the OHCI's Cycle Timer register and checks for > monotony. Could be. The trick would be to work out under what conditions the problem occurs. Is it sufficient to simply set up the transfers, or (while seemingly unlikely) is it dependent on other hardware being present. > BTW, I am wondering about another part of the log from August 18, from line > 801 onwards: > > 1376845672933649: Debug (CycleTimerHelper.cpp)[ 405] Execute: (0xb34d60) have to retry CTR read, diff unrealistic: diff: 3115, max: +/- 3072 (try: 10) 587148227 > 1376845672933751: Debug (CycleTimerHelper.cpp)[ 405] Execute: (0xb34d60) have to retry CTR read, diff unrealistic: diff: 3130, max: +/- 3072 (try: 9) 587150783 > 1376845672933853: Debug (CycleTimerHelper.cpp)[ 405] Execute: (0xb34d60) have to retry CTR read, diff unrealistic: diff: 3120, max: +/- 3072 (try: 8) 587153290 > 1376845672933936: Debug (CycleTimerHelper.cpp)[ 405] Execute: (0xb34d60) have to retry CTR read, diff unrealistic: diff: 3125, max: +/- 3072 (try: 7) 587155329 > 1376845672934020: Debug (CycleTimerHelper.cpp)[ 405] Execute: (0xb34d60) have to retry CTR read, diff unrealistic: diff: 3113, max: +/- 3072 (try: 6) 587157418 > 1376845672934100: Debug (CycleTimerHelper.cpp)[ 405] Execute: (0xb34d60) have to retry CTR read, diff unrealistic: diff: 3126, max: +/- 3072 (try: 5) 587159384 > 1376845672934184: Debug (CycleTimerHelper.cpp)[ 405] Execute: (0xb34d60) have to retry CTR read, diff unrealistic: diff: 3123, max: +/- 3072 (try: 4) 587161449 > > Any thoughts about this? I'd missed these - good catch. I have no immediate thoughts about them except to note that they could be related to the issues we're seeing. I'm guessing that after try 4 there were no further messages which means that presumedly Execute() received a value it was happy with. But this really shouldn't happen, should it? Or at the very least, if there was one failure (due to non-atomicity somewhere) you wouldn't expect to see it on 7 successive read attempts - would you? I'm just trying to work out what the above messages actually mean. Execute() appears to be estimating the CTR value to expect based on the local time and then bailing if the value read isn't within one cycle of this. The estimated CTR value comes via a rate estimate (in getCycleTimerTicks()) which I assume is tied up with a DLL. If the streaming system has not been running long it's entirely possible that the rate estimate is still settling. In that case we could conceivably get some phantom notes like those above. Having said that, I didn't write the streaming infrastructure so I don't know if this makes sense or not. In summary, I guess the above messages could be a natural part of the startup sequence if they occured close to the start of streaming. If they occurred a significant time later then perhaps they indicate that something funny is going on somewhere. Regards jonathan |