Thread: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

dri-devel

[Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

From: José F. <j_r...@ya...> - 2002-05-12 16:29:49

As it becomes more clear that in the mach64 the best solution is to fill 
DMA buffers with the context state and the vertex buffers I've been trying 
to understand how can this be done and how the Gamma driver (which has 
this same model) does.

The context state is available right in the beginning of running a 
pipeline and usually DDUpdateHWState is called in the beginning of 
RunPipeline. The problem is that although all state information is 
available, we don't know which part should be uploaded since other clients 
could dirty the hardware registers in the meanwhile.

I'm don't fully understand how the Gamma driver overcomes this. Its 
behavior regarding this is controled by a macro definition, named 
DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I 
couldn't understand what they do. Another thing that caught my atention 
was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - 
it reminds a similar comment on mach64, which makes one think that the 
author had in mind a better way to do that. Alan, could you shed some 
light on these two issues please?

Before I started this little research I already had given some thought on 
I would do it. One idea that crossed my mind was to reserve some space on 
the DMA buffer to put the context state before submiting the buffer. Of 
course that there would be some DMA buffer waste but it wouldn't that much 
since there are a fairly low number of context registers. One think that 
holds me back is that I still don't understand how multiple clients avoid 
each other: what is done in parallel, and what is done in serie...

I would also appreciate any ideas regarding this. This is surely an issue 
I would like to discuss further on the next meeting.

Regards,

José Fonseca

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

From: Alan H. <al...@fa...> - 2002-05-12 16:35:50

Jose,

I'd certainly forget using the gamma driver as any kind of template for
any work.

There are many unimplemented features, and multiple clients just don't
work.

Purely a lack of time thing.

Alan.

On Sun, May 12, 2002 at 05:27:26 +0100, Jos Fonseca wrote:
> As it becomes more clear that in the mach64 the best solution is to fill 
> DMA buffers with the context state and the vertex buffers I've been trying 
> to understand how can this be done and how the Gamma driver (which has 
> this same model) does.
> 
> The context state is available right in the beginning of running a 
> pipeline and usually DDUpdateHWState is called in the beginning of 
> RunPipeline. The problem is that although all state information is 
> available, we don't know which part should be uploaded since other clients 
> could dirty the hardware registers in the meanwhile.
> 
> I'm don't fully understand how the Gamma driver overcomes this. Its 
> behavior regarding this is controled by a macro definition, named 
> DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I 
> couldn't understand what they do. Another thing that caught my atention 
> was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - 
> it reminds a similar comment on mach64, which makes one think that the 
> author had in mind a better way to do that. Alan, could you shed some 
> light on these two issues please?
> 
> Before I started this little research I already had given some thought on 
> I would do it. One idea that crossed my mind was to reserve some space on 
> the DMA buffer to put the context state before submiting the buffer. Of 
> course that there would be some DMA buffer waste but it wouldn't that much 
> since there are a fairly low number of context registers. One think that 
> holds me back is that I still don't understand how multiple clients avoid 
> each other: what is done in parallel, and what is done in serie...
> 
> I would also appreciate any ideas regarding this. This is surely an issue 
> I would like to discuss further on the next meeting.
> 
> Regards,
> 
> Jos Fonseca
> 
> _______________________________________________________________
> 
> Have big pipes? SourceForge.net is looking for download mirrors. We supply
> the hardware. You get the recognition. Email Us: ban...@so...
> _______________________________________________
> Dri-devel mailing list
> Dri...@li...
> https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

From: José F. <j_r...@ya...> - 2002-05-12 16:42:39

Alan,

On 2002.05.12 17:35 Alan Hourihane wrote:
> Jose,
> 
> I'd certainly forget using the gamma driver as any kind of template for
> any work.
> 
> There are many unimplemented features, and multiple clients just don't
> work.
> 

Ok. I wasn't aware of this.

> Purely a lack of time thing.
> 
> Alan.
> 

Thanks,

José Fonseca

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

From: Leif D. <lde...@re...> - 2002-05-12 18:15:28

Jose,

I've been experimenting with this too, and was able to get things going
with state being emitted either from the client or the drm, though I'm
still having lockups and things are generally a bit buggy and unstable
still.  To try client side context emits, I basically went back to having
each primitive emit state into the vertex buffer before adding the vertex
data, like the original hack with MMIO.  This works, but may be emmiting
state when it's not necessary.  Now I'm trying state emits in the drm, an=
d
to do that I'm just grabbing a buffer from the freelist and adding it to
the queue before the vertex buffer, so things are in the correct order in
the queue.  The downside of this is that buffer space is wasted, since th=
e
state emit uses a small portion of a buffer, but putting state in a
separate buffer from vertex data allows the proper ordering in the queue.=
 =20
Perhaps we could use a private set of smaller buffers for this.  At any
rate, I've done the same for clears and swaps, so I have asynchronous DMA
(minus blits) working with gears at least.  I'm still getting lockups wit=
h
anything more complicated and there are still some state problems.  The
good news is that I'm finally seeing an increase in frame rate, so there'=
s
light at the end of the tunnel.

Right now I'm using 1MB (half the buffers) as the high water mark, so
there should always be plenty of available buffers for the drm.  To get
this working, I've used buffer aging rather than interrupts.  What I
realized with interrupts is that there doesn't appear to be an interrupt
that can poll fast enough to keep up, since a VBLANK is tied to the
vertical refresh -- which is relatively infrequent.  I'm thinking that it
might be best to start out without interrupts and to use GUI masters for
blits and then investigate using interrupts, at least for blits.  Anyway,
I have an implementation of the freelist and other queues that's
functional, though it might require some locks here and there. =20
I'll try to stabilize things more and send a patch for you to look at.

I've also played around some more with AGP textures.  I have hacked up th=
e
performance boxes client-side with clear ioctls, and this helps to see
what's going on.  I'll try to clean that up so I can commit it.  I've
found some problems with the global LRU and texture aging that I'm trying
to fix as well.  I'll post a more detailed summary of that soon.

BTW, as to your question about multiple clients and state:  I think this=20
is handled when acquiring the lock.  If the context stamp on the SAREA=20
doesn't match the current context after getting the lock, everything is=20
marked as dirty to force the current context to emit all it's state. =20
Emitting state to the SAREA is always done while holding the lock.

Regards,

Leif

On Sun, 12 May 2002, Jos=E9 Fonseca wrote:

> As it becomes more clear that in the mach64 the best solution is to fil=
l=20
> DMA buffers with the context state and the vertex buffers I've been try=
ing=20
> to understand how can this be done and how the Gamma driver (which has=20
> this same model) does.
>=20
> The context state is available right in the beginning of running a=20
> pipeline and usually DDUpdateHWState is called in the beginning of=20
> RunPipeline. The problem is that although all state information is=20
> available, we don't know which part should be uploaded since other clie=
nts=20
> could dirty the hardware registers in the meanwhile.
>=20
> I'm don't fully understand how the Gamma driver overcomes this. Its=20
> behavior regarding this is controled by a macro definition, named=20
> DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I=
=20
> couldn't understand what they do. Another thing that caught my atention=
=20
> was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState =
-=20
> it reminds a similar comment on mach64, which makes one think that the=20
> author had in mind a better way to do that. Alan, could you shed some=20
> light on these two issues please?
>=20
> Before I started this little research I already had given some thought =
on=20
> I would do it. One idea that crossed my mind was to reserve some space =
on=20
> the DMA buffer to put the context state before submiting the buffer. Of=
=20
> course that there would be some DMA buffer waste but it wouldn't that m=
uch=20
> since there are a fairly low number of context registers. One think tha=
t=20
> holds me back is that I still don't understand how multiple clients avo=
id=20
> each other: what is done in parallel, and what is done in serie...
>=20
> I would also appreciate any ideas regarding this. This is surely an iss=
ue=20
> I would like to discuss further on the next meeting.
>=20
> Regards,
>=20
> Jos=E9 Fonseca
>=20
> _______________________________________________________________
>=20
> Have big pipes? SourceForge.net is looking for download mirrors. We sup=
ply
> the hardware. You get the recognition. Email Us: bandwidth@sourceforge.=
net
> _______________________________________________
> Dri-devel mailing list
> Dri...@li...
> https://lists.sourceforge.net/lists/listinfo/dri-devel
>=20

--=20
Leif Delgass=20
http://www.retinalburn.net

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

From: José F. <j_r...@ya...> - 2002-05-12 20:32:54

Leif,

On 2002.05.12 19:15 Leif Delgass wrote:
> Jose,
> 
> I've been experimenting with this too, and was able to get things going
> with state being emitted either from the client or the drm, though I'm
> still having lockups and things are generally a bit buggy and unstable
> still.  To try client side context emits, I basically went back to having
> each primitive emit state into the vertex buffer before adding the vertex
> data, like the original hack with MMIO.  This works, but may be emmiting
> state when it's not necessary.

I don't see how that would happen: only the dirty context was updated 
before.

> Now I'm trying state emits in the drm, and

I think that doing the emits on the DRM give us more flexibility than in 
the client.

> to do that I'm just grabbing a buffer from the freelist and adding it to
> the queue before the vertex buffer, so things are in the correct order in
> the queue.  The downside of this is that buffer space is wasted, since
> the
> state emit uses a small portion of a buffer, but putting state in a
> separate buffer from vertex data allows the proper ordering in the queue.

Is it a requirement that the addresses stored in the descriptor tables 
must be aligned on some boundary? If not we could use a single buffer to 
hold succesive context emits, and the first entry each table descriptor 
would point to a section of this buffer. This way there wouldn't be any 
waste of space and a single buffer would suffice for a big number of DMA 
buffers.

> 
> Perhaps we could use a private set of smaller buffers for this.  At any
> rate, I've done the same for clears and swaps, so I have asynchronous DMA
> (minus blits) working with gears at least.

This is another way too. I don't know if we are limited to the kernel 
memory allocation granularity, so unless this is already done by the pci_* 
API we might need to to split buffers into smaller sizes.

> I'm still getting lockups
> with
> anything more complicated and there are still some state problems.  The
> good news is that I'm finally seeing an increase in frame rate, so
> there's
> light at the end of the tunnel.

My time is limited, and I can't spend more than 3 hrs per day on this, but 
I think that after the meeting tomorrow we should try to keep the cvs on 
sync, even if it's less stable - it's a development branch after all and 
its stability is not as important as making progress.

> 
> Right now I'm using 1MB (half the buffers) as the high water mark, so
> there should always be plenty of available buffers for the drm.  To get
> this working, I've used buffer aging rather than interrupts.

Which register do you use to keep track of the buffers age?

> What I
> realized with interrupts is that there doesn't appear to be an interrupt
> that can poll fast enough to keep up, since a VBLANK is tied to the
> vertical refresh -- which is relatively infrequent.  I'm thinking that it
> might be best to start out without interrupts and to use GUI masters for
> blits and then investigate using interrupts, at least for blits.

That had crossed my mind before too. I think it may be a good idea too.

> Anyway,
> I have an implementation of the freelist and other queues that's
> functional, though it might require some locks here and there.
> I'll try to stabilize things more and send a patch for you to look at.
> 

Looking forward to that.

> I've also played around some more with AGP textures.  I have hacked up
> the
> performance boxes client-side with clear ioctls, and this helps to see
> what's going on.  I'll try to clean that up so I can commit it.  I've
> found some problems with the global LRU and texture aging that I'm trying
> to fix as well.  I'll post a more detailed summary of that soon.
> 

What can I say? Great work Leaf! =)

> BTW, as to your question about multiple clients and state:  I think this
> is handled when acquiring the lock.  If the context stamp on the SAREA
> doesn't match the current context after getting the lock, everything is
> marked as dirty to force the current context to emit all it's state.
> Emitting state to the SAREA is always done while holding the lock.
> 
I hadn't realize that before. Thanks for the info.

Regards,

José Fonseca

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

From: Leif D. <lde...@re...> - 2002-05-12 22:43:25

On Sun, 12 May 2002, Jos=E9 Fonseca wrote:

> Leif,
>=20
> On 2002.05.12 19:15 Leif Delgass wrote:
> > Jose,
> >=20
> > I've been experimenting with this too, and was able to get things goi=
ng
> > with state being emitted either from the client or the drm, though I'=
m
> > still having lockups and things are generally a bit buggy and unstabl=
e
> > still.  To try client side context emits, I basically went back to ha=
ving
> > each primitive emit state into the vertex buffer before adding the ve=
rtex
> > data, like the original hack with MMIO.  This works, but may be emmit=
ing
> > state when it's not necessary.
>=20
> I don't see how that would happen: only the dirty context was updated=20
> before.

It didn't really make sense to me as I was writing this, to tell the
truth. :)  I just had it in my head that this way was a hack. I guess it
was just the client-side register programming that made it "evil" before.
At any rate, as you say, I think doing this in the drm is probably better=
=20
anyway.
=20
> > Now I'm trying state emits in the drm, and
>=20
> I think that doing the emits on the DRM give us more flexibility than i=
n=20
> the client.
>=20
> > to do that I'm just grabbing a buffer from the freelist and adding it=
 to
> > the queue before the vertex buffer, so things are in the correct orde=
r in
> > the queue.  The downside of this is that buffer space is wasted, sinc=
e
> > the
> > state emit uses a small portion of a buffer, but putting state in a
> > separate buffer from vertex data allows the proper ordering in the qu=
eue.
>=20
> Is it a requirement that the addresses stored in the descriptor tables=20
> must be aligned on some boundary? If not we could use a single buffer t=
o=20
> hold succesive context emits, and the first entry each table descriptor=
=20
> would point to a section of this buffer. This way there wouldn't be any=
=20
> waste of space and a single buffer would suffice for a big number of DM=
A=20
> buffers.

I think the data tables need to be aligned on a 4K boundary, since that's
the maximum size, but I'm not positive.  I know for sure that the
descriptor table has to aligned to its size.
=20
> >=20
> > Perhaps we could use a private set of smaller buffers for this.  At a=
ny
> > rate, I've done the same for clears and swaps, so I have asynchronous=
 DMA
> > (minus blits) working with gears at least.
>=20
> This is another way too. I don't know if we are limited to the kernel=20
> memory allocation granularity, so unless this is already done by the pc=
i_*=20
> API we might need to to split buffers into smaller sizes.

The pci_pool interface is intended for these sort of small buffers, I
think.  We just tell it to give us 4K buffers and allocate as many as we
need with pci_pool_alloc.  That would give us buffers one quarter the siz=
e
of a full vertex buffer and still satisfy alignment constraints.  This
would also be more secure, since these buffers would be private to the
drm.  We could use these to terminate each DMA pass as well.  That's one
thing that needs more investigation, what registers need to be reset at
the end of a DMA pass?  Right now I'm only writing src_cntl to disable th=
e
bus mastering bit.  Bus_cntl isn't fifo-ed, so it doesn't make sense to m=
e
to set it, even though the utah driver did.  The only drawback to using=20
private buffers is that it complicates the freelist.

> > I'm still getting lockups
> > with
> > anything more complicated and there are still some state problems.  T=
he
> > good news is that I'm finally seeing an increase in frame rate, so
> > there's
> > light at the end of the tunnel.
>=20
> My time is limited, and I can't spend more than 3 hrs per day on this, =
but=20
> I think that after the meeting tomorrow we should try to keep the cvs o=
n=20
> sync, even if it's less stable - it's a development branch after all an=
d=20
> its stability is not as important as making progress.

OK, I'll try to check in more often.  I've been trying a lot of different
things, so I just need to clean things up a bit to minimize the cruft.  I
don't want to check in failed experiments. ;)  For a while the branch is=20
likely to cause frequent lockups.  I'm trying to at least get pseudo-DMA=20
stable again.

> >=20
> > Right now I'm using 1MB (half the buffers) as the high water mark, so
> > there should always be plenty of available buffers for the drm.  To g=
et
> > this working, I've used buffer aging rather than interrupts.
>=20
> Which register do you use to keep track of the buffers age?

I'm using the PAT_REG[0,1] registers since they aren't needed for 3D.  As
long as we make sure that DMA is idle and the register contents are
saved/restored when switching contexts between 2D/3D, I think this should
work.  The DDX only uses them for mono pattern fills in the XAA routine,
and it saves and restores them, so we need to do the same.  I've done tha=
t
in the Enter/LeaveServer in atidri.c.  We should probably also modify the
DDX's Sync routine for XAA to use the drm idle ioctl.  I think we'll need
to make sure that the DMA queue is flushed before checking for engine=20
idle.  At the moment I'm calling the idle ioctl from EnterServer in=20
atidri.c, but I haven't touched the XAA Sync function.
=20
> > What I
> > realized with interrupts is that there doesn't appear to be an interr=
upt
> > that can poll fast enough to keep up, since a VBLANK is tied to the
> > vertical refresh -- which is relatively infrequent.  I'm thinking tha=
t it
> > might be best to start out without interrupts and to use GUI masters =
for
> > blits and then investigate using interrupts, at least for blits.
>=20
> That had crossed my mind before too. I think it may be a good idea too.

I'm keeping Frank's code, so we can return to this, but I've commented ou=
t=20
the call to handle_dma.  I think I'll just disable the interrupt handler=20
for now to eliminate that as a source of problems.
=20
> > Anyway,
> > I have an implementation of the freelist and other queues that's
> > functional, though it might require some locks here and there.
> > I'll try to stabilize things more and send a patch for you to look at.
> >=20
>=20
> Looking forward to that.
>=20
> > I've also played around some more with AGP textures.  I have hacked u=
p
> > the
> > performance boxes client-side with clear ioctls, and this helps to se=
e
> > what's going on.  I'll try to clean that up so I can commit it.  I've
> > found some problems with the global LRU and texture aging that I'm tr=
ying
> > to fix as well.  I'll post a more detailed summary of that soon.
> >=20
>=20
> What can I say? Great work Leaf! =3D)
>=20
> > BTW, as to your question about multiple clients and state:  I think t=
his
> > is handled when acquiring the lock.  If the context stamp on the SARE=
A
> > doesn't match the current context after getting the lock, everything =
is
> > marked as dirty to force the current context to emit all it's state.
> > Emitting state to the SAREA is always done while holding the lock.
> >=20
> I hadn't realize that before. Thanks for the info.
>=20
> Regards,
>=20
> Jos=E9 Fonseca
>=20

--=20
Leif Delgass=20
http://www.retinalburn.net

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

From: Frank C. E. <fe...@ai...> - 2002-05-13 04:43:29

On Sunday 12 May 2002 01:15 pm, Leif Delgass wrote:

> this working, I've used buffer aging rather than interrupts.  What I
> realized with interrupts is that there doesn't appear to be an interrupt
> that can poll fast enough to keep up, since a VBLANK is tied to the
> vertical refresh -- which is relatively infrequent.  

Depends on what you're trying to do with it.  

If you're polling for completion of a pass for a given caller, it may be 
problematic.  Should we be doing that with this chip, though? 

I had envisioned a scheme in which the clients didn't drive DMA, they simply 
submitted things to be rendered out by the chip's DMA and the DRM managed 
details of keeping track of completion, etc.  The first pass of the code I 
was working towards was going to rely on a lack of free buffers to throttle 
the clients accordingly.  If that didn't work as I had hoped, I was looking 
to use async notifications to tell clients the request had been processed.

In that model, the only things needing to deal with locks are the DRM engine 
code submitting the DMAable buffers or blits (ran by a seperate group of 
code...) and any clients/code directly manipulating the chip.  All DMA engine 
clients do is ask for buffers, fill them, and then queue them up for 
submission to the chip's gui-master engine.  The interrupt handler takes care 
of the rest.  In that picture, you're not submitting one group of buffers for 
one client to the chip, you're submitting as many buffers as you think you 
can get away with at 60+ Hz (something like 1-2Mb, from prior experience with 
the Utah-GLX code...) from the queue submitted by all clients.  The DRM lock 
scheme would give enough flexibility to allow the X server to squeeze in 
against the DRM DMA handler and vice-versa so that screen updates and other 
stuff could be managed.

Unfortunately, my lack of available time precluded me from coding in more 
than the base framework for that scheme.  

-- 
Frank Earl

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

From: José F. <j_r...@ya...> - 2002-05-12 19:33:38

Darryl,

On 2002.05.12 19:11 Daryll Strauss wrote:
> On Sun, May 12, 2002 at 05:27:26PM +0100, José Fonseca wrote:
> > I would also appreciate any ideas regarding this. This is surely an
> issue
> > I would like to discuss further on the next meeting.
> 
> You're right, there's no automatic way to know what state has become
> dirty. You need to keep some flags that tell you what state has
> changed when you change clients. Since it is work to keep these flags up
> to date, you have to decide what granularity to keep.
> 
> Any time you don't immediately get the hardware lock you have to check
> your flags to see what changed. In the tdfx driver I kept 3 flags. One
> was that the fifo has changed. That basically meant some other client (X
> server, or 3D app) had written data to the card. I had to resyncronize
> the fifo in that case. The second said that the 3D state was dirty. That
> would only occur when a second 3D app ran (the X server never touched
> the same state) and required that I reset all the 3D parameters. Finally
> there was a texture dirty flag which meant that I had to reload the
> textures on the card.
> 
> The rationale for that breakdown was that X server context switches
> would be common. It has to do input handling for example. So I wanted a
> cheap way to say that the X server had done stuff, but only the fifo
> changed. Next I argued that texture swaps were really expensive. So, if
> two 3D apps were running, but not using textures, it would be nice to
> avoid paging what could be multiple megabytes of textures. Finally that
> meant 3D state was everything else. It wasn't that much data to force to
> a known state, so it wasn't worth breaking that into smaller chunks.
> 
> The three flags were stored in the SAREA, and the first time a client
> changed each of the areas it would store it's number into the
> appropriate flag of the SAREA.
> 

I've been snooping in the tdfx driver source. You're referring to the 
TDFXSAREAPriv.*Owner flags in tdfx_context.h and the tdfxGetLock function 
in tdfx_lock.c, in the tdfx's Mesa driver.

So let's see if I got this straight (these notes are not specific to 3dfx):

  - The SAREA writes are made with the lock (as Leif pointed), and 
reflects the state left by the last client that grabed it.
  - All those "dirty"/"UPDATE" flags are only meaningfull within a client. 
Whenever another client got into the way, the word is upload whatever 
changed  - "what" in particular depends of the granularity gauge you 
mentioned.
  - For this to work with DMA too the buffers must be sent exactly in the 
same order they are received on the DRM.
  - The DRM knows nothing about this: it's up to the client to make sure 
that the information in SAREA is up to date. (The 3dfx is an exception 
since there is no DMA so the state is sent to the card without DRM 
intervention, via the Glide library).

> Just a small expansion on this. The texture management solution is
> weak. If two clients each had a small texture, it would be quite
> possible that they both would have fit in texture memory and no texture
> swapping would be required. Doing that would have required more advanced
> texture management that realized certain regions were in use by one
> client or another. We still don't have that yet. In a grand scheme
> regions of texture memory would be shared between 2D and multiple 3D
> clients.

This would mean that the texture management would had to be made by the 
DRM or, perhaps even better, X. Surely something to look in the future.

Thanks for your reply. It was very informative.

José Fonseca

Re: [Dri-devel] Client context uploads: How to implement them? / Analysis of the Gamma driver

From: Keith W. <ke...@tu...> - 2002-05-12 21:49:33

"Jos=E9 Fonseca" wrote:
>=20
> Darryl,
>=20
> On 2002.05.12 19:11 Daryll Strauss wrote:
> > On Sun, May 12, 2002 at 05:27:26PM +0100, Jos=E9 Fonseca wrote:
> > > I would also appreciate any ideas regarding this. This is surely an
> > issue
> > > I would like to discuss further on the next meeting.
> >
> > You're right, there's no automatic way to know what state has become
> > dirty. You need to keep some flags that tell you what state has
> > changed when you change clients. Since it is work to keep these flags=
 up
> > to date, you have to decide what granularity to keep.
> >
> > Any time you don't immediately get the hardware lock you have to chec=
k
> > your flags to see what changed. In the tdfx driver I kept 3 flags. On=
e
> > was that the fifo has changed. That basically meant some other client=
 (X
> > server, or 3D app) had written data to the card. I had to resyncroniz=
e
> > the fifo in that case. The second said that the 3D state was dirty. T=
hat
> > would only occur when a second 3D app ran (the X server never touched
> > the same state) and required that I reset all the 3D parameters. Fina=
lly
> > there was a texture dirty flag which meant that I had to reload the
> > textures on the card.
> >
> > The rationale for that breakdown was that X server context switches
> > would be common. It has to do input handling for example. So I wanted=
 a
> > cheap way to say that the X server had done stuff, but only the fifo
> > changed. Next I argued that texture swaps were really expensive. So, =
if
> > two 3D apps were running, but not using textures, it would be nice to
> > avoid paging what could be multiple megabytes of textures. Finally th=
at
> > meant 3D state was everything else. It wasn't that much data to force=
 to
> > a known state, so it wasn't worth breaking that into smaller chunks.
> >
> > The three flags were stored in the SAREA, and the first time a client
> > changed each of the areas it would store it's number into the
> > appropriate flag of the SAREA.
> >
>=20
> I've been snooping in the tdfx driver source. You're referring to the
> TDFXSAREAPriv.*Owner flags in tdfx_context.h and the tdfxGetLock functi=
on
> in tdfx_lock.c, in the tdfx's Mesa driver.
>=20
> So let's see if I got this straight (these notes are not specific to 3d=
fx):
>=20
>   - The SAREA writes are made with the lock (as Leif pointed), and
> reflects the state left by the last client that grabed it.
>   - All those "dirty"/"UPDATE" flags are only meaningfull within a clie=
nt.
> Whenever another client got into the way, the word is upload whatever
> changed  - "what" in particular depends of the granularity gauge you
> mentioned.
>   - For this to work with DMA too the buffers must be sent exactly in t=
he
> same order they are received on the DRM.
>   - The DRM knows nothing about this: it's up to the client to make sur=
e
> that the information in SAREA is up to date. (The 3dfx is an exception
> since there is no DMA so the state is sent to the card without DRM
> intervention, via the Glide library).
>=20
> > Just a small expansion on this. The texture management solution is
> > weak. If two clients each had a small texture, it would be quite
> > possible that they both would have fit in texture memory and no textu=
re
> > swapping would be required. Doing that would have required more advan=
ced
> > texture management that realized certain regions were in use by one
> > client or another. We still don't have that yet. In a grand scheme
> > regions of texture memory would be shared between 2D and multiple 3D
> > clients.

We have this for other drivers - there's a linked list of regions in the =
sarea
that let drivers know for each 'chunk' of texture memory whether it has b=
een
stolen by another client or not.

Keith