On Tue, 11 Jun 2002, Jos=E9 Fonseca wrote:
> On 2002.06.11 18:24 Leif Delgass wrote:
> > On Tue, 11 Jun 2002, Jos=E9 Fonseca wrote:
> > > Leif,
> > >
> > > I've got it. After adding checks and checks to everything I finally
> > > discovered it - in some circunstances (such as a regular wait_for_i=
> > > with no previous buffer submiting as an example) we were firing GUI
> > master
> > > without enabling the BM. It's a miracle the card even worked at all=
> > Now I understand why I didn't experience the lockup. I forgot to men=
> > that in my tree, I renamed your do_dma_flush to do_wait_for_dma (whic=
> > was
> > still called from do_dma_idle), and I made do_dma_flush call dma_star=
> > do_dma_flush is called by the flush ioctl, which is called before wai=
> > for idle by the DDX and Mesa driver, so this ensured that busmasterin=
> > was
> > on before waiting for idle. We still need to sort this out, because =
> > DDFlush in the Mesa driver should flush the ring, ensuring all buffer=
> > will be completed by the hardware in finite time, but shouldn't wait =
> > idle (only DDFinish needs to wait for idle).
> In that case we just need to ensure that the head >=3D the offset of th=
> last ring submited.
But that means essentially waiting for idle. The ring tail is moved ever=
time we add a buffer to the ring, so the "last ring submitted" is always=20
close to the new tail and could be separated from the head by many=20
descriptors. We just need to make sure that the card isn't going to stop=
so the flush needs to restart DMA if the card is idle and ensure that it=20
_won't_ stop if the card is active.=20
> > A couple of other comments on your change: I think we could just
> Although there are some things you can try simplify this is _very_ tric=
> as you gonna see below...
> > unconditionally turn on busmastering in SRC_CNTL if the engine is idl=
> No. The engine could be active but not doing BM, e.g., if X submited=20
> something to FIFO.
Notice that I said if the engine is _idle_. The engine can't be active i=
it's idle, right? And it doesn't make sense to turn on busmastering whil=
the card is active. This is assuming if GUI_ACTIVE is 0, then the FIFO i=
also empty. The docs say that a state of idleness (GUI_ACTIVE=3D0) impli=
an empty fifo, so the FIFO check in wait_for_idle is really redundant. I'=
saying we move the call to dma_start into the block in UPDATE_RING_HEAD
where we already know the engine is idle from GUI_STAT, and move the two
writes to kick off the transfer into dma_start as well (just to group all
the writes that start a transfer in one place).
> > UPDATE_RING_HEAD. One write is likely to use fewer cycles than one r=
> > and certainly less than one read followed by one write. Plus, if the
Actually, I realized there are two writes (one to SRC_CNTL and one to=20
BUS_CNTL), so reading SRC_CNTL could be better. As you say, we could kee=
track of this with a variable, as long as we account for the possibility=20
of the card disabling bus mastering in the case of a lockup.
> As I said in the code, what we can do is have a variable indicating=20
> whether the busmaster is ON or OFF:
> > engine is active, there shouldn't be a need to check if bus mastering=
> > on and enable it. That also makes the do_wait_for_idle in dma_start
> The Mach64 SDK sample code calls wait_for_idle before enabling the BM, =
> we need to do that too.
See above. We'd already know the engine is idle when enabling
> > unnecessary. The flush ioctl could just call UPDATE_RING_HEAD, and
> > possibly wait to see the head advance a couple of descriptors to make
> > sure the engine won't stop.
> What's the point in waiting for anything? I though we both had agree th=
> we shouldn't wait for anything when submiting a buffer... (after all we=
> both drawn the same conclusions in that matter in two distinct times).
As I mentioned above, we just need to ensure that the card won't stop
because it has already read an old end of list flag. We could have added
several descriptors to the ring while the card is still active on an
earlier descriptor which the card sees as the end of the ring. So
restarting the card if it's idle doesn't ensure that it won't stop if it'=
active. I was just thinking that waiting to see the card advance a coupl=
of descriptors (but not waiting for idle) might be enough to ensure that=20
it isn't going to stop.
> Note that I'm quite reluctant in making optimizations where you have to=
> make more assumptions to gain a few cycles. We already assume alot, and=
> it's very dificult to determine when our assumptions are wrong. If anot=
> bug like this one (which took me a whole week to solve) happens on the=20
> hands of a user we'll be forever trying to debug it.
I think the assumption that the card is not active when it's idle is a=20
safe one. ;)
> I also plan to leave all assumptions that we already do to be expressed=
> the code (with MACH64_EXTRA_CHECKING or a better name perhaps) so that =
> can debug these kind of problems more easily when they happen.
> > > I'm really relieved for the problem was in the code and in no way
> > > associated with the card misbehavior on slow computers!
> > >
> > > I think that now we can make some code cleaning. If you have no
> > > objections, I would to remove some of the code associated with the
> > BUFFER
> > > AGING=3D0, and/or when MACH64_NO_BATCH_DISPATCH=3D0.
> > I think that's probably fine, but I think we should see if we can sal=
> > the pseudo-dma path. We should also consider what to do about frame
> > aging. Also, the interrupt path is probably pretty broken now, we sh=
> > see if there's any way to make that a workable alternative or remove =
> I'll leave both those paths as they are for now - they are fairly=20
> contained anyway. Later on we'll decide what to do with that code.
> Regarding the frame aging we can do that by pointing down the ring offs=
> of the last buffer of each frame.
> > > The UPDATE_RING_HEAD macro is getting monolithical but after giving
> > some
> > > thoughs about it I've come to the conclusion that that code must re=
> > be
> > > there. The only liberty we have to either leave in a macro, inline =
> > or
> > > put in a seperate directories. What I'll do it leave the most used =
> > > path in the macro, and define subroutines for doing the less used
> > paths.
> > Maybe, if you make the changes I suggested above, all the work done i=
> > the
> > block where the engine is idle in UPDATE_RING_HEAD could be moved int=
> > dma_start which could be made an inline function. At the moment, tha=
> > the only place dma_start is called, right?
> Yep. But after thinking better and it's probably better do all in=20
> UPDATE_RING... there isn't much point in putting 2 output instructions =
> a seperate function where the call overhead takes more instructions tha=
> that, especially if they are called only from a single function.
But doesn't inlining the function remove the overhead? I was suggesting=20
this mainly for readability, it's not really necessary if dma_start isn't=
going to be used anywhere else.
I'm attaching a diff so you can see what I'm talking about. Hopefully=20
this will remove any confusion. :)