On Sun, 12 May 2002, Jos=E9 Fonseca wrote:
> On 2002.05.12 19:15 Leif Delgass wrote:
> > Jose,
> > I've been experimenting with this too, and was able to get things goi=
> > with state being emitted either from the client or the drm, though I'=
> > still having lockups and things are generally a bit buggy and unstabl=
> > still. To try client side context emits, I basically went back to ha=
> > each primitive emit state into the vertex buffer before adding the ve=
> > data, like the original hack with MMIO. This works, but may be emmit=
> > state when it's not necessary.
> I don't see how that would happen: only the dirty context was updated=20
It didn't really make sense to me as I was writing this, to tell the
truth. :) I just had it in my head that this way was a hack. I guess it
was just the client-side register programming that made it "evil" before.
At any rate, as you say, I think doing this in the drm is probably better=
> > Now I'm trying state emits in the drm, and
> I think that doing the emits on the DRM give us more flexibility than i=
> the client.
> > to do that I'm just grabbing a buffer from the freelist and adding it=
> > the queue before the vertex buffer, so things are in the correct orde=
> > the queue. The downside of this is that buffer space is wasted, sinc=
> > the
> > state emit uses a small portion of a buffer, but putting state in a
> > separate buffer from vertex data allows the proper ordering in the qu=
> Is it a requirement that the addresses stored in the descriptor tables=20
> must be aligned on some boundary? If not we could use a single buffer t=
> hold succesive context emits, and the first entry each table descriptor=
> would point to a section of this buffer. This way there wouldn't be any=
> waste of space and a single buffer would suffice for a big number of DM=
I think the data tables need to be aligned on a 4K boundary, since that's
the maximum size, but I'm not positive. I know for sure that the
descriptor table has to aligned to its size.
> > Perhaps we could use a private set of smaller buffers for this. At a=
> > rate, I've done the same for clears and swaps, so I have asynchronous=
> > (minus blits) working with gears at least.
> This is another way too. I don't know if we are limited to the kernel=20
> memory allocation granularity, so unless this is already done by the pc=
> API we might need to to split buffers into smaller sizes.
The pci_pool interface is intended for these sort of small buffers, I
think. We just tell it to give us 4K buffers and allocate as many as we
need with pci_pool_alloc. That would give us buffers one quarter the siz=
of a full vertex buffer and still satisfy alignment constraints. This
would also be more secure, since these buffers would be private to the
drm. We could use these to terminate each DMA pass as well. That's one
thing that needs more investigation, what registers need to be reset at
the end of a DMA pass? Right now I'm only writing src_cntl to disable th=
bus mastering bit. Bus_cntl isn't fifo-ed, so it doesn't make sense to m=
to set it, even though the utah driver did. The only drawback to using=20
private buffers is that it complicates the freelist.
> > I'm still getting lockups
> > with
> > anything more complicated and there are still some state problems. T=
> > good news is that I'm finally seeing an increase in frame rate, so
> > there's
> > light at the end of the tunnel.
> My time is limited, and I can't spend more than 3 hrs per day on this, =
> I think that after the meeting tomorrow we should try to keep the cvs o=
> sync, even if it's less stable - it's a development branch after all an=
> its stability is not as important as making progress.
OK, I'll try to check in more often. I've been trying a lot of different
things, so I just need to clean things up a bit to minimize the cruft. I
don't want to check in failed experiments. ;) For a while the branch is=20
likely to cause frequent lockups. I'm trying to at least get pseudo-DMA=20
> > Right now I'm using 1MB (half the buffers) as the high water mark, so
> > there should always be plenty of available buffers for the drm. To g=
> > this working, I've used buffer aging rather than interrupts.
> Which register do you use to keep track of the buffers age?
I'm using the PAT_REG[0,1] registers since they aren't needed for 3D. As
long as we make sure that DMA is idle and the register contents are
saved/restored when switching contexts between 2D/3D, I think this should
work. The DDX only uses them for mono pattern fills in the XAA routine,
and it saves and restores them, so we need to do the same. I've done tha=
in the Enter/LeaveServer in atidri.c. We should probably also modify the
DDX's Sync routine for XAA to use the drm idle ioctl. I think we'll need
to make sure that the DMA queue is flushed before checking for engine=20
idle. At the moment I'm calling the idle ioctl from EnterServer in=20
atidri.c, but I haven't touched the XAA Sync function.
> > What I
> > realized with interrupts is that there doesn't appear to be an interr=
> > that can poll fast enough to keep up, since a VBLANK is tied to the
> > vertical refresh -- which is relatively infrequent. I'm thinking tha=
> > might be best to start out without interrupts and to use GUI masters =
> > blits and then investigate using interrupts, at least for blits.
> That had crossed my mind before too. I think it may be a good idea too.
I'm keeping Frank's code, so we can return to this, but I've commented ou=
the call to handle_dma. I think I'll just disable the interrupt handler=20
for now to eliminate that as a source of problems.
> > Anyway,
> > I have an implementation of the freelist and other queues that's
> > functional, though it might require some locks here and there.
> > I'll try to stabilize things more and send a patch for you to look at.
> Looking forward to that.
> > I've also played around some more with AGP textures. I have hacked u=
> > the
> > performance boxes client-side with clear ioctls, and this helps to se=
> > what's going on. I'll try to clean that up so I can commit it. I've
> > found some problems with the global LRU and texture aging that I'm tr=
> > to fix as well. I'll post a more detailed summary of that soon.
> What can I say? Great work Leaf! =3D)
> > BTW, as to your question about multiple clients and state: I think t=
> > is handled when acquiring the lock. If the context stamp on the SARE=
> > doesn't match the current context after getting the lock, everything =
> > marked as dirty to force the current context to emit all it's state.
> > Emitting state to the SAREA is always done while holding the lock.
> I hadn't realize that before. Thanks for the info.
> Jos=E9 Fonseca