Re: [Dri-devel] Radeon 7500 lockup

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Wednesday 29 May 2002 11:26 am, Keith Whitwell scribed numinously:"
> Tim Smith wrote:
> > On Tuesday 28 May 2002 1:13 pm, Keith Whitwell scribed numinously:"
> >
> >>Tim Smith wrote:
> >>>On Tuesday 28 May 2002 8:40 am, Keith Whitwell scribed numinously:"
> >>>
> >>>>Tim Smith wrote:
> >>>>>I've done some more digging and it would appear to send an enormous
> >>>>>amount of commands via radeon_cp_cmdbuf, all ending up going via
> >>>>>radeon_emit_packet3_cliprect. Actually I don't think it's such a
> >>>>> huge number compared to other calls (32 commands), except that this
> >>>>> time the loop gets called with nbox=4 so it emits everything 4
> >>>>> times, and runs out of FIFO.
> >>>>
> >>>>If it's really a fifo issue, I may have a solution for this in terms
> >>>> of only writing the ring tail register once per ioctl.
> >>>
> >>>Is it in a testable state? If you mail me a patch I'll test it tonight
> >>>and report.
> >>>
> >>>Assuming that RADEON_CP_RB_WPTR is the ring tail register, I count at
> >>>least 20 writes to it on this one call, and 16 on the previous "clear"
> >>>(also with nbox=4). And there are 13 consecutive calls to
> >>>radeon_emit_packet3_cliprect with nbox=4 before it locks, making a
> >>>total of 260 writes from one call to radeon_cp_cmdbuf.
> >>
> >>It's committed on the tcl branch now (it's also a minor optimization).
> >
> > No so minor actually; the difference between the two may be small, but
> > there is a noticeable improvement in interactivity when wandering
> > around the 3D map, at least in my application.
>
> Thats good to know - I didn't get a very large improvement here.

Not sure exactly how long writing to a card register takes, but I'd suspect 
it's quite a lot of CPU cycles on an Athlon 1200 :-) When there's a lot of 
2D stuff going on as well there are quite a lot of calls to 
radeonEmitState, which previously caused quite a lot of such writes per 
call. I wouldn't say there was a frame rate increase (wouldn't have 
expected one), but the rest of the app's interface seems "snappier".

> > It doesn't fix this lockup though, so I guess it's not a FIFO issue.
> >
> > The big storm of stuff during which the lockup occurs appears to be
> > generated by radeonEmitState ('lost context'), so maybe there's
> > something unpleasant in that lot. It's been called a lot of times
> > before, without causing trouble though. The only thing I can think of
> > is that this time there's still a menu window slap bang in the middle
> > of the 3D area (simply popping up a menu and then taking it away again
> > will not produce the lockup; I have to pop up a submenu and then let
> > the submenu go away while its parent is still there to make it lock).
> >
> > Can anyone give me an idea of sequences which it's known will lock the
> > card, so I can attempt to determine if any of them is occurring? I'm a
> > "bottom-up" kind of guy when it comes to bug hunting, especially when
> > I'm trying to understand the code as I go.
>
> The trouble with that approach is there are probably an infinite number
> of bad sequences...  The classic one is providing a "bad" number of
> vertices or indices for a render packet, but there are thousands of ways
> to lock these cards, it seems.

Oh. Bugger :-)

> What I'm doing at the moment is building a sanity checker for the stream
> that looks at the command headers and verifies that they make sense. 
> After that we could do some value tracking to see if this is the first
> time a given register has been set to a particular value, or the first of
> a number of other occurances.  This should draw the eye to unusual
> things.
>
> As Linus has pointed out, that patch reduces the number of writes to the
> ring register, but doesn't necessarily prevent overflows.  It could yet
> be a fifo problem.

Hmmm. The following may be useful information:

I've added some traces to read the free space in the FIFO each time 
COMMIT_RING() happens, and also in radeon_do_wait_for_fifo() to trace out 
the min/max number of FIFO slots it found in the loop if it failed.

After the hefty call to radeon_cp_cmdbuf() mentioned in a previous post, 
here's the trace I get:

<7>[drm:radeon_ioctl] pid=1600, cmd=0x40106450, nr=0x50, dev 0xe200, auth=1
<7>[drm:radeon_cp_cmdbuf] radeon_cp_cmdbuf: pid=1600
<7>[drm:radeon_emit_packet3] radeon_emit_packet3

.....lots and lots of stuff....

<7>[drm:radeon_emit_packet3] radeon_emit_packet3
<7>[drm:radeon_emit_packet3_cliprect] radeon_emit_packet3_cliprect
<7>[drm:radeon_emit_clip_rect]    box:  x1=209 y1=74  x2=573 y2=284
<7>[drm:radeon_emit_clip_rect]    box:  x1=209 y1=284  x2=325 y2=361
<7>[drm:radeon_emit_clip_rect]    box:  x1=451 y1=284  x2=573 y2=361
<7>[drm:radeon_emit_clip_rect]    box:  x1=209 y1=361  x2=573 y2=452
<7>[drm:radeon_cp_cmdbuf] commit ring
<7>[drm:radeon_cp_cmdbuf] fifo slots: 64

So it was OK before it told the card about that lot....

<7>[drm:radeon_ioctl] pid=1600, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
<7>[drm:radeon_cp_idle] radeon_cp_idle
<7>[drm:radeon_do_cp_idle] radeon_do_cp_idle
<7>[drm:radeon_do_cp_idle] commit ring
<7>[drm:radeon_do_cp_idle] fifo slots: 0

...it's rather busy now...

<7>[drm:radeon_do_wait_for_fifo] slots: min=6, max=14, delay=100000

...but not completely dead!

<3>[drm:radeon_do_wait_for_fifo] *ERROR* failed!
<4>radeon_status:
<4>DONE AGE = 15168
<4>RBBM_STATUS = 0x80116100
<4>CP_RB_RTPR = 0x000096c2
<4>CP_RB_WTPR = 0x0000981e
<4>AIC_CNTL = 0x00000000
<4>AIC_STAT = 0x00000004
<4>AIC_PT_BASE = 0x00000000
<4>TLB_ADDR = 0x00000000
<4>TLB_DATA = 0x00000000
<7>[drm:radeon_ioctl] pid=1594, cmd=0x4008642a, nr=0x2a, dev 0xe200, auth=1
<7>[drm:radeon_lock] 1 (pid 1594) requests lock (0x80000006), flags = 
0x00000000
<7>[drm:radeon_ioctl] pid=1600, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
<7>[drm:radeon_cp_idle] radeon_cp_idle
<7>[drm:radeon_do_cp_idle] radeon_do_cp_idle
<7>[drm:radeon_do_cp_idle] commit ring
<7>[drm:radeon_do_cp_idle] fifo slots: 0
<7>[drm:radeon_do_wait_for_fifo] slots: min=0, max=0, delay=100000
<3>[drm:radeon_do_wait_for_fifo] *ERROR* failed!
<4>radeon_status:
<4>DONE AGE = 15168
<4>RBBM_STATUS = 0x80116100
<4>CP_RB_RTPR = 0x000096c2
<4>CP_RB_WTPR = 0x00009824
<4>AIC_CNTL = 0x00000000
<4>AIC_STAT = 0x00000004
<4>AIC_PT_BASE = 0x00000000
<4>TLB_ADDR = 0x00000000
<4>TLB_DATA = 0x00000000

...Splat.

During the call to radeon_do_wait_for_fifo it processed 8 commands in 
100,000usec, or 1 command every 12.5ms. That certainly seems rather slow, 
but maybe they're just commands that take a while. Is is possible to send 
texture uploads through that route?. Note this is on the maximum 
"CPusecTimeout" of 100,000usec. If I leave that at the default of 10,000, I 
get a trace like

<7>[drm:radeon_do_wait_for_fifo] slots: min=4, max=4, delay=10000

I can also get

<7>[drm:radeon_do_wait_for_fifo] slots: min=4, max=5, delay=20000
 and
<7>[drm:radeon_do_wait_for_fifo] slots: min=5, max=7, delay=30000

which is consistent at least, with the 12.5ms command rate.

A thought: will reading the value of the RADEON_RBBM_STATUS register every 
microsecond somewhat impede the card's ability to do the catching up of 
command buffers that the routine is waiting for?  If so, maybe such a short 
delay is counterproductive. ISTR reading a hardware register over PCI takes 
about 1.5 microseconds, though that could be a racial memory from a 
previous life as a toad.

I've attempted some rather pathetic rate-adaption to make everything slow 
down when the FIFO gets full. It utterly murders performance but it did 
prevent the lockup from occurring. I modified ADVANCE_RING to take a 
parameter containing a Wild-Ass-Guess of how many commands were being 
added, then to call COMMIT_RING if it got over the maximum. COMMIT_RING 
then read the number of free FIFO slots and adjusted the maximum up or down 
depending on how well we were doing, and started to hang around for a bit 
if things were getting full.

-- 
Tim Smith (ti...@el...)
"Once I took me moolies / and set them in a snare / 'Twas then I spied
a scroper's man / a-whurdling a hare..."
  -- Rambling Sid Rumpo