From: Tim S. <ti...@el...> - 2002-05-30 00:08:34
|
On Wednesday 29 May 2002 11:26 am, Keith Whitwell scribed numinously:" > Tim Smith wrote: > > On Tuesday 28 May 2002 1:13 pm, Keith Whitwell scribed numinously:" > > > >>Tim Smith wrote: > >>>On Tuesday 28 May 2002 8:40 am, Keith Whitwell scribed numinously:" > >>> > >>>>Tim Smith wrote: > >>>>>I've done some more digging and it would appear to send an enormous > >>>>>amount of commands via radeon_cp_cmdbuf, all ending up going via > >>>>>radeon_emit_packet3_cliprect. Actually I don't think it's such a > >>>>> huge number compared to other calls (32 commands), except that this > >>>>> time the loop gets called with nbox=4 so it emits everything 4 > >>>>> times, and runs out of FIFO. > >>>> > >>>>If it's really a fifo issue, I may have a solution for this in terms > >>>> of only writing the ring tail register once per ioctl. > >>> > >>>Is it in a testable state? If you mail me a patch I'll test it tonight > >>>and report. > >>> > >>>Assuming that RADEON_CP_RB_WPTR is the ring tail register, I count at > >>>least 20 writes to it on this one call, and 16 on the previous "clear" > >>>(also with nbox=4). And there are 13 consecutive calls to > >>>radeon_emit_packet3_cliprect with nbox=4 before it locks, making a > >>>total of 260 writes from one call to radeon_cp_cmdbuf. > >> > >>It's committed on the tcl branch now (it's also a minor optimization). > > > > No so minor actually; the difference between the two may be small, but > > there is a noticeable improvement in interactivity when wandering > > around the 3D map, at least in my application. > > Thats good to know - I didn't get a very large improvement here. Not sure exactly how long writing to a card register takes, but I'd suspect it's quite a lot of CPU cycles on an Athlon 1200 :-) When there's a lot of 2D stuff going on as well there are quite a lot of calls to radeonEmitState, which previously caused quite a lot of such writes per call. I wouldn't say there was a frame rate increase (wouldn't have expected one), but the rest of the app's interface seems "snappier". > > It doesn't fix this lockup though, so I guess it's not a FIFO issue. > > > > The big storm of stuff during which the lockup occurs appears to be > > generated by radeonEmitState ('lost context'), so maybe there's > > something unpleasant in that lot. It's been called a lot of times > > before, without causing trouble though. The only thing I can think of > > is that this time there's still a menu window slap bang in the middle > > of the 3D area (simply popping up a menu and then taking it away again > > will not produce the lockup; I have to pop up a submenu and then let > > the submenu go away while its parent is still there to make it lock). > > > > Can anyone give me an idea of sequences which it's known will lock the > > card, so I can attempt to determine if any of them is occurring? I'm a > > "bottom-up" kind of guy when it comes to bug hunting, especially when > > I'm trying to understand the code as I go. > > The trouble with that approach is there are probably an infinite number > of bad sequences... The classic one is providing a "bad" number of > vertices or indices for a render packet, but there are thousands of ways > to lock these cards, it seems. Oh. Bugger :-) > What I'm doing at the moment is building a sanity checker for the stream > that looks at the command headers and verifies that they make sense. > After that we could do some value tracking to see if this is the first > time a given register has been set to a particular value, or the first of > a number of other occurances. This should draw the eye to unusual > things. > > As Linus has pointed out, that patch reduces the number of writes to the > ring register, but doesn't necessarily prevent overflows. It could yet > be a fifo problem. Hmmm. The following may be useful information: I've added some traces to read the free space in the FIFO each time COMMIT_RING() happens, and also in radeon_do_wait_for_fifo() to trace out the min/max number of FIFO slots it found in the loop if it failed. After the hefty call to radeon_cp_cmdbuf() mentioned in a previous post, here's the trace I get: <7>[drm:radeon_ioctl] pid=1600, cmd=0x40106450, nr=0x50, dev 0xe200, auth=1 <7>[drm:radeon_cp_cmdbuf] radeon_cp_cmdbuf: pid=1600 <7>[drm:radeon_emit_packet3] radeon_emit_packet3 .....lots and lots of stuff.... <7>[drm:radeon_emit_packet3] radeon_emit_packet3 <7>[drm:radeon_emit_packet3_cliprect] radeon_emit_packet3_cliprect <7>[drm:radeon_emit_clip_rect] box: x1=209 y1=74 x2=573 y2=284 <7>[drm:radeon_emit_clip_rect] box: x1=209 y1=284 x2=325 y2=361 <7>[drm:radeon_emit_clip_rect] box: x1=451 y1=284 x2=573 y2=361 <7>[drm:radeon_emit_clip_rect] box: x1=209 y1=361 x2=573 y2=452 <7>[drm:radeon_cp_cmdbuf] commit ring <7>[drm:radeon_cp_cmdbuf] fifo slots: 64 So it was OK before it told the card about that lot.... <7>[drm:radeon_ioctl] pid=1600, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 <7>[drm:radeon_cp_idle] radeon_cp_idle <7>[drm:radeon_do_cp_idle] radeon_do_cp_idle <7>[drm:radeon_do_cp_idle] commit ring <7>[drm:radeon_do_cp_idle] fifo slots: 0 ...it's rather busy now... <7>[drm:radeon_do_wait_for_fifo] slots: min=6, max=14, delay=100000 ...but not completely dead! <3>[drm:radeon_do_wait_for_fifo] *ERROR* failed! <4>radeon_status: <4>DONE AGE = 15168 <4>RBBM_STATUS = 0x80116100 <4>CP_RB_RTPR = 0x000096c2 <4>CP_RB_WTPR = 0x0000981e <4>AIC_CNTL = 0x00000000 <4>AIC_STAT = 0x00000004 <4>AIC_PT_BASE = 0x00000000 <4>TLB_ADDR = 0x00000000 <4>TLB_DATA = 0x00000000 <7>[drm:radeon_ioctl] pid=1594, cmd=0x4008642a, nr=0x2a, dev 0xe200, auth=1 <7>[drm:radeon_lock] 1 (pid 1594) requests lock (0x80000006), flags = 0x00000000 <7>[drm:radeon_ioctl] pid=1600, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 <7>[drm:radeon_cp_idle] radeon_cp_idle <7>[drm:radeon_do_cp_idle] radeon_do_cp_idle <7>[drm:radeon_do_cp_idle] commit ring <7>[drm:radeon_do_cp_idle] fifo slots: 0 <7>[drm:radeon_do_wait_for_fifo] slots: min=0, max=0, delay=100000 <3>[drm:radeon_do_wait_for_fifo] *ERROR* failed! <4>radeon_status: <4>DONE AGE = 15168 <4>RBBM_STATUS = 0x80116100 <4>CP_RB_RTPR = 0x000096c2 <4>CP_RB_WTPR = 0x00009824 <4>AIC_CNTL = 0x00000000 <4>AIC_STAT = 0x00000004 <4>AIC_PT_BASE = 0x00000000 <4>TLB_ADDR = 0x00000000 <4>TLB_DATA = 0x00000000 ...Splat. During the call to radeon_do_wait_for_fifo it processed 8 commands in 100,000usec, or 1 command every 12.5ms. That certainly seems rather slow, but maybe they're just commands that take a while. Is is possible to send texture uploads through that route?. Note this is on the maximum "CPusecTimeout" of 100,000usec. If I leave that at the default of 10,000, I get a trace like <7>[drm:radeon_do_wait_for_fifo] slots: min=4, max=4, delay=10000 I can also get <7>[drm:radeon_do_wait_for_fifo] slots: min=4, max=5, delay=20000 and <7>[drm:radeon_do_wait_for_fifo] slots: min=5, max=7, delay=30000 which is consistent at least, with the 12.5ms command rate. A thought: will reading the value of the RADEON_RBBM_STATUS register every microsecond somewhat impede the card's ability to do the catching up of command buffers that the routine is waiting for? If so, maybe such a short delay is counterproductive. ISTR reading a hardware register over PCI takes about 1.5 microseconds, though that could be a racial memory from a previous life as a toad. I've attempted some rather pathetic rate-adaption to make everything slow down when the FIFO gets full. It utterly murders performance but it did prevent the lockup from occurring. I modified ADVANCE_RING to take a parameter containing a Wild-Ass-Guess of how many commands were being added, then to call COMMIT_RING if it got over the maximum. COMMIT_RING then read the number of free FIFO slots and adjusted the maximum up or down depending on how well we were doing, and started to hang around for a bit if things were getting full. -- Tim Smith (ti...@el...) "Once I took me moolies / and set them in a snare / 'Twas then I spied a scroper's man / a-whurdling a hare..." -- Rambling Sid Rumpo |