From: Michel <mi...@da...> - 2003-04-27 00:34:14
|
On Son, 2003-04-27 at 00:43, Felix Kühling wrote: > On 26 Apr 2003 19:13:15 +0200 > Michel Dänzer <mi...@da...> wrote: > > > On Sam, 2003-04-26 at 18:32, Felix Kühling wrote: > > > On 26 Apr 2003 16:14:34 +0200 > > > Michel Dänzer <mi...@da...> wrote: > [...] > > > > something scribbles over RADEON_SCRATCH_ADDR or RADEON_SCRATCH_UMSK, can > > > > you verify that? > > > > > > Hmm, I don't think it's a random corruption of the scratch register. > > > With gdb I saw that the value returned by the last-frame-ioctl was > > > exactly sarea->last_frame-1. I checked this only once, but it would be > > > quite a conincedence. > > > > Sure, but the registers I mentioned aren't the actual scratch registers > > but those that control which of them get written back to memory and > > where. Still, if other clients work after the hang: > > Ok, I answered before reading the code. I admit, I still don't quite > understand what this writeback is all about. The chip writes the contents of the scratch registers to RAM so we don't have to poll them via PCI bus cycles. > I did verify that RADEON_SCRATCH_ADDR or RADEON_SCRATCH_UMSK don't get > modified with the following code radeon_cp_getparam: Okay, so much for that theory. :) > > > It seems as if one scratch register write got lost. > > > > This sounds like the most plausible explanation indeed. We should > > probably read the scratch registers directly every now and then, say > > after ten iterations of doing it differently. > > With RADEON_NO_IRQS I could debug the problem safely. Then the client > only uses radeonGetLastFrame for frame throttling and does a usleep > without holding the lock. So the Xserver stays responsive. I added some > debugging output in radeonWaitForFrameCompletion in radeon_ioctl.c: > > while (radeonGetLastFrame (rmesa) < sarea->last_frame) { > UNLOCK_HARDWARE( rmesa ); > if (rmesa->do_usleeps) > do_usleep(1, __FUNCTION__); > fprintf (stderr, "%d < %d\n", radeonGetLastFrame (rmesa), > sarea->last_frame); > LOCK_HARDWARE( rmesa ); > } > > With glxgears I get output like this: > > 191140 < 1800 > 195066 < 1802 > 1808 < 1808 > 1810 < 1810 > 214500 < 1812 > 218398 < 1814 > 198424 < 1816 > 1818 < 1818 > 1820 < 1820 > ... > > So there is random corruption, Weird. Does anyone have any idea how this can happen? Bad interaction between the chip writing and the CPU reading? Can you try hacking the code such that after about ten times of calling the GETPARAM ioctl, it reads the scratch register directly with INREG, and see if that works? > > > I'll try (later tonight) to selectively back out your ring read pointer > > > change to verify if this is really the problem. > > > > I doubt it's the problem per se, but the hangs may disappear when you > > revert it because the writeback won't work. > > Aha, I understand. Now it might still be interesting what you get with that change reverted though. -- Earthling Michel Dänzer \ Debian (powerpc), XFree86 and DRI developer Software libre enthusiast \ http://svcs.affero.net/rm.php?r=daenzer |