From: José F. <j_r...@ya...> - 2002-05-12 16:29:49
|
As it becomes more clear that in the mach64 the best solution is to fill DMA buffers with the context state and the vertex buffers I've been trying to understand how can this be done and how the Gamma driver (which has this same model) does. The context state is available right in the beginning of running a pipeline and usually DDUpdateHWState is called in the beginning of RunPipeline. The problem is that although all state information is available, we don't know which part should be uploaded since other clients could dirty the hardware registers in the meanwhile. I'm don't fully understand how the Gamma driver overcomes this. Its behavior regarding this is controled by a macro definition, named DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I couldn't understand what they do. Another thing that caught my atention was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - it reminds a similar comment on mach64, which makes one think that the author had in mind a better way to do that. Alan, could you shed some light on these two issues please? Before I started this little research I already had given some thought on I would do it. One idea that crossed my mind was to reserve some space on the DMA buffer to put the context state before submiting the buffer. Of course that there would be some DMA buffer waste but it wouldn't that much since there are a fairly low number of context registers. One think that holds me back is that I still don't understand how multiple clients avoid each other: what is done in parallel, and what is done in serie... I would also appreciate any ideas regarding this. This is surely an issue I would like to discuss further on the next meeting. Regards, José Fonseca |
From: Alan H. <al...@fa...> - 2002-05-12 16:35:50
|
Jose, I'd certainly forget using the gamma driver as any kind of template for any work. There are many unimplemented features, and multiple clients just don't work. Purely a lack of time thing. Alan. On Sun, May 12, 2002 at 05:27:26 +0100, Jos Fonseca wrote: > As it becomes more clear that in the mach64 the best solution is to fill > DMA buffers with the context state and the vertex buffers I've been trying > to understand how can this be done and how the Gamma driver (which has > this same model) does. > > The context state is available right in the beginning of running a > pipeline and usually DDUpdateHWState is called in the beginning of > RunPipeline. The problem is that although all state information is > available, we don't know which part should be uploaded since other clients > could dirty the hardware registers in the meanwhile. > > I'm don't fully understand how the Gamma driver overcomes this. Its > behavior regarding this is controled by a macro definition, named > DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I > couldn't understand what they do. Another thing that caught my atention > was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - > it reminds a similar comment on mach64, which makes one think that the > author had in mind a better way to do that. Alan, could you shed some > light on these two issues please? > > Before I started this little research I already had given some thought on > I would do it. One idea that crossed my mind was to reserve some space on > the DMA buffer to put the context state before submiting the buffer. Of > course that there would be some DMA buffer waste but it wouldn't that much > since there are a fairly low number of context registers. One think that > holds me back is that I still don't understand how multiple clients avoid > each other: what is done in parallel, and what is done in serie... > > I would also appreciate any ideas regarding this. This is surely an issue > I would like to discuss further on the next meeting. > > Regards, > > Jos Fonseca > > _______________________________________________________________ > > Have big pipes? SourceForge.net is looking for download mirrors. We supply > the hardware. You get the recognition. Email Us: ban...@so... > _______________________________________________ > Dri-devel mailing list > Dri...@li... > https://lists.sourceforge.net/lists/listinfo/dri-devel |
From: José F. <j_r...@ya...> - 2002-05-12 16:42:39
|
Alan, On 2002.05.12 17:35 Alan Hourihane wrote: > Jose, > > I'd certainly forget using the gamma driver as any kind of template for > any work. > > There are many unimplemented features, and multiple clients just don't > work. > Ok. I wasn't aware of this. > Purely a lack of time thing. > > Alan. > Thanks, José Fonseca |
From: Leif D. <lde...@re...> - 2002-05-12 18:15:28
|
Jose, I've been experimenting with this too, and was able to get things going with state being emitted either from the client or the drm, though I'm still having lockups and things are generally a bit buggy and unstable still. To try client side context emits, I basically went back to having each primitive emit state into the vertex buffer before adding the vertex data, like the original hack with MMIO. This works, but may be emmiting state when it's not necessary. Now I'm trying state emits in the drm, an= d to do that I'm just grabbing a buffer from the freelist and adding it to the queue before the vertex buffer, so things are in the correct order in the queue. The downside of this is that buffer space is wasted, since th= e state emit uses a small portion of a buffer, but putting state in a separate buffer from vertex data allows the proper ordering in the queue.= =20 Perhaps we could use a private set of smaller buffers for this. At any rate, I've done the same for clears and swaps, so I have asynchronous DMA (minus blits) working with gears at least. I'm still getting lockups wit= h anything more complicated and there are still some state problems. The good news is that I'm finally seeing an increase in frame rate, so there'= s light at the end of the tunnel. Right now I'm using 1MB (half the buffers) as the high water mark, so there should always be plenty of available buffers for the drm. To get this working, I've used buffer aging rather than interrupts. What I realized with interrupts is that there doesn't appear to be an interrupt that can poll fast enough to keep up, since a VBLANK is tied to the vertical refresh -- which is relatively infrequent. I'm thinking that it might be best to start out without interrupts and to use GUI masters for blits and then investigate using interrupts, at least for blits. Anyway, I have an implementation of the freelist and other queues that's functional, though it might require some locks here and there. =20 I'll try to stabilize things more and send a patch for you to look at. I've also played around some more with AGP textures. I have hacked up th= e performance boxes client-side with clear ioctls, and this helps to see what's going on. I'll try to clean that up so I can commit it. I've found some problems with the global LRU and texture aging that I'm trying to fix as well. I'll post a more detailed summary of that soon. BTW, as to your question about multiple clients and state: I think this=20 is handled when acquiring the lock. If the context stamp on the SAREA=20 doesn't match the current context after getting the lock, everything is=20 marked as dirty to force the current context to emit all it's state. =20 Emitting state to the SAREA is always done while holding the lock. Regards, Leif On Sun, 12 May 2002, Jos=E9 Fonseca wrote: > As it becomes more clear that in the mach64 the best solution is to fil= l=20 > DMA buffers with the context state and the vertex buffers I've been try= ing=20 > to understand how can this be done and how the Gamma driver (which has=20 > this same model) does. >=20 > The context state is available right in the beginning of running a=20 > pipeline and usually DDUpdateHWState is called in the beginning of=20 > RunPipeline. The problem is that although all state information is=20 > available, we don't know which part should be uploaded since other clie= nts=20 > could dirty the hardware registers in the meanwhile. >=20 > I'm don't fully understand how the Gamma driver overcomes this. Its=20 > behavior regarding this is controled by a macro definition, named=20 > DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I= =20 > couldn't understand what they do. Another thing that caught my atention= =20 > was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState = -=20 > it reminds a similar comment on mach64, which makes one think that the=20 > author had in mind a better way to do that. Alan, could you shed some=20 > light on these two issues please? >=20 > Before I started this little research I already had given some thought = on=20 > I would do it. One idea that crossed my mind was to reserve some space = on=20 > the DMA buffer to put the context state before submiting the buffer. Of= =20 > course that there would be some DMA buffer waste but it wouldn't that m= uch=20 > since there are a fairly low number of context registers. One think tha= t=20 > holds me back is that I still don't understand how multiple clients avo= id=20 > each other: what is done in parallel, and what is done in serie... >=20 > I would also appreciate any ideas regarding this. This is surely an iss= ue=20 > I would like to discuss further on the next meeting. >=20 > Regards, >=20 > Jos=E9 Fonseca >=20 > _______________________________________________________________ >=20 > Have big pipes? SourceForge.net is looking for download mirrors. We sup= ply > the hardware. You get the recognition. Email Us: bandwidth@sourceforge.= net > _______________________________________________ > Dri-devel mailing list > Dri...@li... > https://lists.sourceforge.net/lists/listinfo/dri-devel >=20 --=20 Leif Delgass=20 http://www.retinalburn.net |
From: José F. <j_r...@ya...> - 2002-05-12 20:32:54
|
Leif, On 2002.05.12 19:15 Leif Delgass wrote: > Jose, > > I've been experimenting with this too, and was able to get things going > with state being emitted either from the client or the drm, though I'm > still having lockups and things are generally a bit buggy and unstable > still. To try client side context emits, I basically went back to having > each primitive emit state into the vertex buffer before adding the vertex > data, like the original hack with MMIO. This works, but may be emmiting > state when it's not necessary. I don't see how that would happen: only the dirty context was updated before. > Now I'm trying state emits in the drm, and I think that doing the emits on the DRM give us more flexibility than in the client. > to do that I'm just grabbing a buffer from the freelist and adding it to > the queue before the vertex buffer, so things are in the correct order in > the queue. The downside of this is that buffer space is wasted, since > the > state emit uses a small portion of a buffer, but putting state in a > separate buffer from vertex data allows the proper ordering in the queue. Is it a requirement that the addresses stored in the descriptor tables must be aligned on some boundary? If not we could use a single buffer to hold succesive context emits, and the first entry each table descriptor would point to a section of this buffer. This way there wouldn't be any waste of space and a single buffer would suffice for a big number of DMA buffers. > > Perhaps we could use a private set of smaller buffers for this. At any > rate, I've done the same for clears and swaps, so I have asynchronous DMA > (minus blits) working with gears at least. This is another way too. I don't know if we are limited to the kernel memory allocation granularity, so unless this is already done by the pci_* API we might need to to split buffers into smaller sizes. > I'm still getting lockups > with > anything more complicated and there are still some state problems. The > good news is that I'm finally seeing an increase in frame rate, so > there's > light at the end of the tunnel. My time is limited, and I can't spend more than 3 hrs per day on this, but I think that after the meeting tomorrow we should try to keep the cvs on sync, even if it's less stable - it's a development branch after all and its stability is not as important as making progress. > > Right now I'm using 1MB (half the buffers) as the high water mark, so > there should always be plenty of available buffers for the drm. To get > this working, I've used buffer aging rather than interrupts. Which register do you use to keep track of the buffers age? > What I > realized with interrupts is that there doesn't appear to be an interrupt > that can poll fast enough to keep up, since a VBLANK is tied to the > vertical refresh -- which is relatively infrequent. I'm thinking that it > might be best to start out without interrupts and to use GUI masters for > blits and then investigate using interrupts, at least for blits. That had crossed my mind before too. I think it may be a good idea too. > Anyway, > I have an implementation of the freelist and other queues that's > functional, though it might require some locks here and there. > I'll try to stabilize things more and send a patch for you to look at. > Looking forward to that. > I've also played around some more with AGP textures. I have hacked up > the > performance boxes client-side with clear ioctls, and this helps to see > what's going on. I'll try to clean that up so I can commit it. I've > found some problems with the global LRU and texture aging that I'm trying > to fix as well. I'll post a more detailed summary of that soon. > What can I say? Great work Leaf! =) > BTW, as to your question about multiple clients and state: I think this > is handled when acquiring the lock. If the context stamp on the SAREA > doesn't match the current context after getting the lock, everything is > marked as dirty to force the current context to emit all it's state. > Emitting state to the SAREA is always done while holding the lock. > I hadn't realize that before. Thanks for the info. Regards, José Fonseca |
From: Leif D. <lde...@re...> - 2002-05-12 22:43:25
|
On Sun, 12 May 2002, Jos=E9 Fonseca wrote: > Leif, >=20 > On 2002.05.12 19:15 Leif Delgass wrote: > > Jose, > >=20 > > I've been experimenting with this too, and was able to get things goi= ng > > with state being emitted either from the client or the drm, though I'= m > > still having lockups and things are generally a bit buggy and unstabl= e > > still. To try client side context emits, I basically went back to ha= ving > > each primitive emit state into the vertex buffer before adding the ve= rtex > > data, like the original hack with MMIO. This works, but may be emmit= ing > > state when it's not necessary. >=20 > I don't see how that would happen: only the dirty context was updated=20 > before. It didn't really make sense to me as I was writing this, to tell the truth. :) I just had it in my head that this way was a hack. I guess it was just the client-side register programming that made it "evil" before. At any rate, as you say, I think doing this in the drm is probably better= =20 anyway. =20 > > Now I'm trying state emits in the drm, and >=20 > I think that doing the emits on the DRM give us more flexibility than i= n=20 > the client. >=20 > > to do that I'm just grabbing a buffer from the freelist and adding it= to > > the queue before the vertex buffer, so things are in the correct orde= r in > > the queue. The downside of this is that buffer space is wasted, sinc= e > > the > > state emit uses a small portion of a buffer, but putting state in a > > separate buffer from vertex data allows the proper ordering in the qu= eue. >=20 > Is it a requirement that the addresses stored in the descriptor tables=20 > must be aligned on some boundary? If not we could use a single buffer t= o=20 > hold succesive context emits, and the first entry each table descriptor= =20 > would point to a section of this buffer. This way there wouldn't be any= =20 > waste of space and a single buffer would suffice for a big number of DM= A=20 > buffers. I think the data tables need to be aligned on a 4K boundary, since that's the maximum size, but I'm not positive. I know for sure that the descriptor table has to aligned to its size. =20 > >=20 > > Perhaps we could use a private set of smaller buffers for this. At a= ny > > rate, I've done the same for clears and swaps, so I have asynchronous= DMA > > (minus blits) working with gears at least. >=20 > This is another way too. I don't know if we are limited to the kernel=20 > memory allocation granularity, so unless this is already done by the pc= i_*=20 > API we might need to to split buffers into smaller sizes. The pci_pool interface is intended for these sort of small buffers, I think. We just tell it to give us 4K buffers and allocate as many as we need with pci_pool_alloc. That would give us buffers one quarter the siz= e of a full vertex buffer and still satisfy alignment constraints. This would also be more secure, since these buffers would be private to the drm. We could use these to terminate each DMA pass as well. That's one thing that needs more investigation, what registers need to be reset at the end of a DMA pass? Right now I'm only writing src_cntl to disable th= e bus mastering bit. Bus_cntl isn't fifo-ed, so it doesn't make sense to m= e to set it, even though the utah driver did. The only drawback to using=20 private buffers is that it complicates the freelist. > > I'm still getting lockups > > with > > anything more complicated and there are still some state problems. T= he > > good news is that I'm finally seeing an increase in frame rate, so > > there's > > light at the end of the tunnel. >=20 > My time is limited, and I can't spend more than 3 hrs per day on this, = but=20 > I think that after the meeting tomorrow we should try to keep the cvs o= n=20 > sync, even if it's less stable - it's a development branch after all an= d=20 > its stability is not as important as making progress. OK, I'll try to check in more often. I've been trying a lot of different things, so I just need to clean things up a bit to minimize the cruft. I don't want to check in failed experiments. ;) For a while the branch is=20 likely to cause frequent lockups. I'm trying to at least get pseudo-DMA=20 stable again. > >=20 > > Right now I'm using 1MB (half the buffers) as the high water mark, so > > there should always be plenty of available buffers for the drm. To g= et > > this working, I've used buffer aging rather than interrupts. >=20 > Which register do you use to keep track of the buffers age? I'm using the PAT_REG[0,1] registers since they aren't needed for 3D. As long as we make sure that DMA is idle and the register contents are saved/restored when switching contexts between 2D/3D, I think this should work. The DDX only uses them for mono pattern fills in the XAA routine, and it saves and restores them, so we need to do the same. I've done tha= t in the Enter/LeaveServer in atidri.c. We should probably also modify the DDX's Sync routine for XAA to use the drm idle ioctl. I think we'll need to make sure that the DMA queue is flushed before checking for engine=20 idle. At the moment I'm calling the idle ioctl from EnterServer in=20 atidri.c, but I haven't touched the XAA Sync function. =20 > > What I > > realized with interrupts is that there doesn't appear to be an interr= upt > > that can poll fast enough to keep up, since a VBLANK is tied to the > > vertical refresh -- which is relatively infrequent. I'm thinking tha= t it > > might be best to start out without interrupts and to use GUI masters = for > > blits and then investigate using interrupts, at least for blits. >=20 > That had crossed my mind before too. I think it may be a good idea too. I'm keeping Frank's code, so we can return to this, but I've commented ou= t=20 the call to handle_dma. I think I'll just disable the interrupt handler=20 for now to eliminate that as a source of problems. =20 > > Anyway, > > I have an implementation of the freelist and other queues that's > > functional, though it might require some locks here and there. > > I'll try to stabilize things more and send a patch for you to look at. > >=20 >=20 > Looking forward to that. >=20 > > I've also played around some more with AGP textures. I have hacked u= p > > the > > performance boxes client-side with clear ioctls, and this helps to se= e > > what's going on. I'll try to clean that up so I can commit it. I've > > found some problems with the global LRU and texture aging that I'm tr= ying > > to fix as well. I'll post a more detailed summary of that soon. > >=20 >=20 > What can I say? Great work Leaf! =3D) >=20 > > BTW, as to your question about multiple clients and state: I think t= his > > is handled when acquiring the lock. If the context stamp on the SARE= A > > doesn't match the current context after getting the lock, everything = is > > marked as dirty to force the current context to emit all it's state. > > Emitting state to the SAREA is always done while holding the lock. > >=20 > I hadn't realize that before. Thanks for the info. >=20 > Regards, >=20 > Jos=E9 Fonseca >=20 --=20 Leif Delgass=20 http://www.retinalburn.net |
From: Frank C. E. <fe...@ai...> - 2002-05-13 04:43:29
|
On Sunday 12 May 2002 01:15 pm, Leif Delgass wrote: > this working, I've used buffer aging rather than interrupts. What I > realized with interrupts is that there doesn't appear to be an interrupt > that can poll fast enough to keep up, since a VBLANK is tied to the > vertical refresh -- which is relatively infrequent. Depends on what you're trying to do with it. If you're polling for completion of a pass for a given caller, it may be problematic. Should we be doing that with this chip, though? I had envisioned a scheme in which the clients didn't drive DMA, they simply submitted things to be rendered out by the chip's DMA and the DRM managed details of keeping track of completion, etc. The first pass of the code I was working towards was going to rely on a lack of free buffers to throttle the clients accordingly. If that didn't work as I had hoped, I was looking to use async notifications to tell clients the request had been processed. In that model, the only things needing to deal with locks are the DRM engine code submitting the DMAable buffers or blits (ran by a seperate group of code...) and any clients/code directly manipulating the chip. All DMA engine clients do is ask for buffers, fill them, and then queue them up for submission to the chip's gui-master engine. The interrupt handler takes care of the rest. In that picture, you're not submitting one group of buffers for one client to the chip, you're submitting as many buffers as you think you can get away with at 60+ Hz (something like 1-2Mb, from prior experience with the Utah-GLX code...) from the queue submitted by all clients. The DRM lock scheme would give enough flexibility to allow the X server to squeeze in against the DRM DMA handler and vice-versa so that screen updates and other stuff could be managed. Unfortunately, my lack of available time precluded me from coding in more than the base framework for that scheme. -- Frank Earl |
From: José F. <j_r...@ya...> - 2002-05-12 19:33:38
|
Darryl, On 2002.05.12 19:11 Daryll Strauss wrote: > On Sun, May 12, 2002 at 05:27:26PM +0100, José Fonseca wrote: > > I would also appreciate any ideas regarding this. This is surely an > issue > > I would like to discuss further on the next meeting. > > You're right, there's no automatic way to know what state has become > dirty. You need to keep some flags that tell you what state has > changed when you change clients. Since it is work to keep these flags up > to date, you have to decide what granularity to keep. > > Any time you don't immediately get the hardware lock you have to check > your flags to see what changed. In the tdfx driver I kept 3 flags. One > was that the fifo has changed. That basically meant some other client (X > server, or 3D app) had written data to the card. I had to resyncronize > the fifo in that case. The second said that the 3D state was dirty. That > would only occur when a second 3D app ran (the X server never touched > the same state) and required that I reset all the 3D parameters. Finally > there was a texture dirty flag which meant that I had to reload the > textures on the card. > > The rationale for that breakdown was that X server context switches > would be common. It has to do input handling for example. So I wanted a > cheap way to say that the X server had done stuff, but only the fifo > changed. Next I argued that texture swaps were really expensive. So, if > two 3D apps were running, but not using textures, it would be nice to > avoid paging what could be multiple megabytes of textures. Finally that > meant 3D state was everything else. It wasn't that much data to force to > a known state, so it wasn't worth breaking that into smaller chunks. > > The three flags were stored in the SAREA, and the first time a client > changed each of the areas it would store it's number into the > appropriate flag of the SAREA. > I've been snooping in the tdfx driver source. You're referring to the TDFXSAREAPriv.*Owner flags in tdfx_context.h and the tdfxGetLock function in tdfx_lock.c, in the tdfx's Mesa driver. So let's see if I got this straight (these notes are not specific to 3dfx): - The SAREA writes are made with the lock (as Leif pointed), and reflects the state left by the last client that grabed it. - All those "dirty"/"UPDATE" flags are only meaningfull within a client. Whenever another client got into the way, the word is upload whatever changed - "what" in particular depends of the granularity gauge you mentioned. - For this to work with DMA too the buffers must be sent exactly in the same order they are received on the DRM. - The DRM knows nothing about this: it's up to the client to make sure that the information in SAREA is up to date. (The 3dfx is an exception since there is no DMA so the state is sent to the card without DRM intervention, via the Glide library). > Just a small expansion on this. The texture management solution is > weak. If two clients each had a small texture, it would be quite > possible that they both would have fit in texture memory and no texture > swapping would be required. Doing that would have required more advanced > texture management that realized certain regions were in use by one > client or another. We still don't have that yet. In a grand scheme > regions of texture memory would be shared between 2D and multiple 3D > clients. This would mean that the texture management would had to be made by the DRM or, perhaps even better, X. Surely something to look in the future. Thanks for your reply. It was very informative. José Fonseca |
From: Keith W. <ke...@tu...> - 2002-05-12 21:49:33
|
"Jos=E9 Fonseca" wrote: >=20 > Darryl, >=20 > On 2002.05.12 19:11 Daryll Strauss wrote: > > On Sun, May 12, 2002 at 05:27:26PM +0100, Jos=E9 Fonseca wrote: > > > I would also appreciate any ideas regarding this. This is surely an > > issue > > > I would like to discuss further on the next meeting. > > > > You're right, there's no automatic way to know what state has become > > dirty. You need to keep some flags that tell you what state has > > changed when you change clients. Since it is work to keep these flags= up > > to date, you have to decide what granularity to keep. > > > > Any time you don't immediately get the hardware lock you have to chec= k > > your flags to see what changed. In the tdfx driver I kept 3 flags. On= e > > was that the fifo has changed. That basically meant some other client= (X > > server, or 3D app) had written data to the card. I had to resyncroniz= e > > the fifo in that case. The second said that the 3D state was dirty. T= hat > > would only occur when a second 3D app ran (the X server never touched > > the same state) and required that I reset all the 3D parameters. Fina= lly > > there was a texture dirty flag which meant that I had to reload the > > textures on the card. > > > > The rationale for that breakdown was that X server context switches > > would be common. It has to do input handling for example. So I wanted= a > > cheap way to say that the X server had done stuff, but only the fifo > > changed. Next I argued that texture swaps were really expensive. So, = if > > two 3D apps were running, but not using textures, it would be nice to > > avoid paging what could be multiple megabytes of textures. Finally th= at > > meant 3D state was everything else. It wasn't that much data to force= to > > a known state, so it wasn't worth breaking that into smaller chunks. > > > > The three flags were stored in the SAREA, and the first time a client > > changed each of the areas it would store it's number into the > > appropriate flag of the SAREA. > > >=20 > I've been snooping in the tdfx driver source. You're referring to the > TDFXSAREAPriv.*Owner flags in tdfx_context.h and the tdfxGetLock functi= on > in tdfx_lock.c, in the tdfx's Mesa driver. >=20 > So let's see if I got this straight (these notes are not specific to 3d= fx): >=20 > - The SAREA writes are made with the lock (as Leif pointed), and > reflects the state left by the last client that grabed it. > - All those "dirty"/"UPDATE" flags are only meaningfull within a clie= nt. > Whenever another client got into the way, the word is upload whatever > changed - "what" in particular depends of the granularity gauge you > mentioned. > - For this to work with DMA too the buffers must be sent exactly in t= he > same order they are received on the DRM. > - The DRM knows nothing about this: it's up to the client to make sur= e > that the information in SAREA is up to date. (The 3dfx is an exception > since there is no DMA so the state is sent to the card without DRM > intervention, via the Glide library). >=20 > > Just a small expansion on this. The texture management solution is > > weak. If two clients each had a small texture, it would be quite > > possible that they both would have fit in texture memory and no textu= re > > swapping would be required. Doing that would have required more advan= ced > > texture management that realized certain regions were in use by one > > client or another. We still don't have that yet. In a grand scheme > > regions of texture memory would be shared between 2D and multiple 3D > > clients. We have this for other drivers - there's a linked list of regions in the = sarea that let drivers know for each 'chunk' of texture memory whether it has b= een stolen by another client or not. Keith |
From: Leif D. <lde...@re...> - 2002-05-12 22:19:09
|
On Sun, 12 May 2002, Keith Whitwell wrote: > > > Just a small expansion on this. The texture management solution is > > > weak. If two clients each had a small texture, it would be quite > > > possible that they both would have fit in texture memory and no texture > > > swapping would be required. Doing that would have required more advanced > > > texture management that realized certain regions were in use by one > > > client or another. We still don't have that yet. In a grand scheme > > > regions of texture memory would be shared between 2D and multiple 3D > > > clients. > > We have this for other drivers - there's a linked list of regions in the sarea > that let drivers know for each 'chunk' of texture memory whether it has been > stolen by another client or not. > Good timing, I was just composing a message about this. Maybe you can help me... In working on AGP texturing for mach64, I'm starting from the Rage128 code, which seems to have some problems (though the texture aging problem could affect other drivers). My understanding is that textures in the global LRU are marked as "used" and aged so that placeholders can be inserted in a context's local LRU when another context steals its texture memory. The problem is that nowhere are these texture regions released by the context using them. The global LRU is only reset when the heap is full. So the heap has to fill up before placeholders begin to get swapped out. I've seen this when running multiple contexts at once, or repeatedly starting, stopping, and restarting a single app. This isn't a huge problem with a single heap, but with an AGP heap it means that card memory is effectively leaked. Once the card memory global LRU is nearly filled in the sarea with regions marked as "used", newly started apps will start out only using AGP mem (with the r128 algorithm). Only if the app uses enough mem. to fill AGP will it start to swap out the placeholders from the local LRU and use card memory. One possible solution I'm playing with would be to use a context identifier on texture regions in the global LRU rather than a boolean "in_use" (similar to the ctxOwner identifier used for marking the last owner of the sarea's state information). Then when a context swaps out or destroys textures, it can free regions that it owns from the global LRU and age them so that other contexts will swap out their corresponding placeholders. The downside is an increased penalty for swapping textures. Another problem is how to reclaim "leaked" regions when an app doesn't exit normally? I've also found what looks to me like a bug in the Rage128 driver in UploadTexImages. The beginning of the function does this: /* Choose the heap appropriately */ heap = t->heap = R128_CARD_HEAP; if ( !rmesa->r128Screen->IsPCI && t->totalSize > rmesa->r128Screen->texSize[heap] ) { heap = t->heap = R128_AGP_HEAP; } /* Do we need to eject LRU texture objects? */ if ( !t->memBlock ) { Find a memBlock, swapping and/or changing heaps if necessary... } Update LRU and upload dirty images... The problem I see here is that setting t->heap before checking for an existing memBlock could potentially lead to a situation where t->heap != t->memBlock->heap. So in my code I've deferred changing t->heap = heap to inside the 'if' block where we know there is no memBlock. Again this situation can only occur if there is an AGP heap. Is there a reason for this behavior in the Rage128 code that I'm missing, or is this a bug? -- Leif Delgass http://www.retinalburn.net |
From: Ian R. <id...@us...> - 2002-05-13 17:01:39
|
On Sun, May 12, 2002 at 06:18:58PM -0400, Leif Delgass wrote: > In working on AGP texturing for mach64, I'm starting from the Rage128 > code, which seems to have some problems (though the texture aging problem > could affect other drivers). My understanding is that textures in the > global LRU are marked as "used" and aged so that placeholders can be > inserted in a context's local LRU when another context steals its texture > memory. The problem is that nowhere are these texture regions released by > the context using them. The global LRU is only reset when the heap is > full. So the heap has to fill up before placeholders begin to get swapped > out. I've seen this when running multiple contexts at once, or repeatedly > starting, stopping, and restarting a single app. This isn't a huge > problem with a single heap, but with an AGP heap it means that card memory > is effectively leaked. Once the card memory global LRU is nearly filled > in the sarea with regions marked as "used", newly started apps will start > out only using AGP mem (with the r128 algorithm). Only if the app uses > enough mem. to fill AGP will it start to swap out the placeholders from > the local LRU and use card memory. If this is true, then I believe it is a bug in the r128 driver. IIRC, all of the space in the global texture heaps is freed when the context is destroyed. This is the way that the Radeon and MGA drivers work. The r128 driver follows the same model so it is /intended/ to work the same. > One possible solution I'm playing with would be to use a context > identifier on texture regions in the global LRU rather than a boolean > "in_use" (similar to the ctxOwner identifier used for marking the last > owner of the sarea's state information). Then when a context swaps out or > destroys textures, it can free regions that it owns from the global LRU > and age them so that other contexts will swap out their corresponding > placeholders. The downside is an increased penalty for swapping textures. > Another problem is how to reclaim "leaked" regions when an app doesn't > exit normally? This is not needed. When texture space is freed in the global heap by one context, the other contexts will see that the state of those blocks in the global heap has changed from owned to free. > I've also found what looks to me like a bug in the Rage128 driver in > UploadTexImages. The beginning of the function does this: > > /* Choose the heap appropriately */ > heap = t->heap = R128_CARD_HEAP; > if ( !rmesa->r128Screen->IsPCI && > t->totalSize > rmesa->r128Screen->texSize[heap] ) { > heap = t->heap = R128_AGP_HEAP; > } > > /* Do we need to eject LRU texture objects? */ > if ( !t->memBlock ) { > > Find a memBlock, swapping and/or changing heaps if necessary... > > } > > Update LRU and upload dirty images... > > The problem I see here is that setting t->heap before checking for an > existing memBlock could potentially lead to a situation where t->heap != > t->memBlock->heap. So in my code I've deferred changing t->heap = heap to > inside the 'if' block where we know there is no memBlock. Again this > situation can only occur if there is an AGP heap. Is there a reason for > this behavior in the Rage128 code that I'm missing, or is this a bug? The MGA driver does something similar. The idea is sound, but there may be implementation bugs. Basically, the driver will start with heap 0 (i.e., the on-card memory) and all the remaining heaps. The if-block in the above code, basically, tells the drive to not bother checking on-card memory and start with the AGP heap. Here it looks like it's telling the driver to not try to allocate from on-card memory if the texture is larger than will fit on the card if all of the memory is available. This is a good heuristic. Without it, trying to allocate a very large texture would cause ALL of the other textures to get kicked out of on-card memory AND still have the allocation FAIL. :) -- Tell that to the Marines! |
From: Leif D. <lde...@re...> - 2002-05-13 20:07:14
|
On Mon, 13 May 2002, Ian Romanick wrote: > On Sun, May 12, 2002 at 06:18:58PM -0400, Leif Delgass wrote: > > > In working on AGP texturing for mach64, I'm starting from the Rage128 > > code, which seems to have some problems (though the texture aging problem > > could affect other drivers). My understanding is that textures in the > > global LRU are marked as "used" and aged so that placeholders can be > > inserted in a context's local LRU when another context steals its texture > > memory. The problem is that nowhere are these texture regions released by > > the context using them. The global LRU is only reset when the heap is > > full. So the heap has to fill up before placeholders begin to get swapped > > out. I've seen this when running multiple contexts at once, or repeatedly > > starting, stopping, and restarting a single app. This isn't a huge > > problem with a single heap, but with an AGP heap it means that card memory > > is effectively leaked. Once the card memory global LRU is nearly filled > > in the sarea with regions marked as "used", newly started apps will start > > out only using AGP mem (with the r128 algorithm). Only if the app uses > > enough mem. to fill AGP will it start to swap out the placeholders from > > the local LRU and use card memory. > > If this is true, then I believe it is a bug in the r128 driver. IIRC, all > of the space in the global texture heaps is freed when the context is > destroyed. This is the way that the Radeon and MGA drivers work. The r128 > driver follows the same model so it is /intended/ to work the same. It looks to me like this is a bug in all the drivers. Try grepping for 'in_use' in the Mesa drivers. I don't see anywhere in _any_ of the drivers where a context sets in_use to zero in a texture region in the global heap. The local heap is destroyed when the context is destroyed, but it doesn't touch the global heap. There are only two places where the global LRU is modified. One is in UpdateTexLRU where the region is marked as in_use, aged, and moved to the head of the list. The second is in ResetGlobalLRU, where the list is (re)built and ages are reset. The Reset function is only called from AgeTextures when the heap is full (which it can detect because the Reset function leaves out one region when rebuilding the list), which also forces TexturesGone to swap out everything in the heap, including all of the placeholders. To see this happening, you can enable the Print[Global,Local]LRU functions in UpdateTexLRU, restart X, then repeatedly start, stop and restart the texenv Mesa demo. Each time you restart, you'll see a new placeholder in the local LRU if the problem is there. If I'm right, this should be happening in all the drivers. Can someone test this on the Radeon driver? > > One possible solution I'm playing with would be to use a context > > identifier on texture regions in the global LRU rather than a boolean > > "in_use" (similar to the ctxOwner identifier used for marking the last > > owner of the sarea's state information). Then when a context swaps out or > > destroys textures, it can free regions that it owns from the global LRU > > and age them so that other contexts will swap out their corresponding > > placeholders. The downside is an increased penalty for swapping textures. > > Another problem is how to reclaim "leaked" regions when an app doesn't > > exit normally? > > This is not needed. When texture space is freed in the global heap by one > context, the other contexts will see that the state of those blocks in the > global heap has changed from owned to free. This is assuming that the space is actually freed. Maybe it wouldn't be necessary if you only swap or destroy textures when holding the lock. Is DestroyContext called while holding the lock? > > I've also found what looks to me like a bug in the Rage128 driver in > > UploadTexImages. The beginning of the function does this: > > > > /* Choose the heap appropriately */ > > heap = t->heap = R128_CARD_HEAP; > > if ( !rmesa->r128Screen->IsPCI && > > t->totalSize > rmesa->r128Screen->texSize[heap] ) { > > heap = t->heap = R128_AGP_HEAP; > > } > > > > /* Do we need to eject LRU texture objects? */ > > if ( !t->memBlock ) { > > > > Find a memBlock, swapping and/or changing heaps if necessary... > > > > } > > > > Update LRU and upload dirty images... > > > > The problem I see here is that setting t->heap before checking for an > > existing memBlock could potentially lead to a situation where t->heap != > > t->memBlock->heap. So in my code I've deferred changing t->heap = heap to > > inside the 'if' block where we know there is no memBlock. Again this > > situation can only occur if there is an AGP heap. Is there a reason for > > this behavior in the Rage128 code that I'm missing, or is this a bug? > > The MGA driver does something similar. The idea is sound, but there may be > implementation bugs. Basically, the driver will start with heap 0 (i.e., the > on-card memory) and all the remaining heaps. The if-block in the above > code, basically, tells the drive to not bother checking on-card memory and > start with the AGP heap. > > Here it looks like it's telling the driver to not try to allocate from > on-card memory if the texture is larger than will fit on the card if all of > the memory is available. This is a good heuristic. Without it, trying to > allocate a very large texture would cause ALL of the other textures to get > kicked out of on-card memory AND still have the allocation FAIL. :) I understand the algorithm, that's not the problem -- it's the _second_ if block. My point is that if, at the beginning of the function, the texture is already allocated on the AGP heap (t->heap == t->memBlock->heap == AGP) and is small enough to fit card memory, t->heap is changed to the CARD heap unconditionally by the first line. However, since the texture already has a memBlock in AGP, the second if block is not entered and memBlock remains on the AGP heap, so t->heap == CARD, but t->memBlock->heap == AGP. I'm saying that t->heap should only be changed to the CARD heap if it doesn't already have a memBlock (i.e. it's new or has been swapped out). -- Leif Delgass http://www.retinalburn.net |
From: Keith W. <ke...@tu...> - 2002-05-13 20:19:03
|
Leif Delgass wrote: > > On Mon, 13 May 2002, Ian Romanick wrote: > > > On Sun, May 12, 2002 at 06:18:58PM -0400, Leif Delgass wrote: > > > > > In working on AGP texturing for mach64, I'm starting from the Rage128 > > > code, which seems to have some problems (though the texture aging problem > > > could affect other drivers). My understanding is that textures in the > > > global LRU are marked as "used" and aged so that placeholders can be > > > inserted in a context's local LRU when another context steals its texture > > > memory. The problem is that nowhere are these texture regions released by > > > the context using them. The global LRU is only reset when the heap is > > > full. So the heap has to fill up before placeholders begin to get swapped > > > out. I've seen this when running multiple contexts at once, or repeatedly > > > starting, stopping, and restarting a single app. This isn't a huge > > > problem with a single heap, but with an AGP heap it means that card memory > > > is effectively leaked. Once the card memory global LRU is nearly filled > > > in the sarea with regions marked as "used", newly started apps will start > > > out only using AGP mem (with the r128 algorithm). Only if the app uses > > > enough mem. to fill AGP will it start to swap out the placeholders from > > > the local LRU and use card memory. > > > > If this is true, then I believe it is a bug in the r128 driver. IIRC, all > > of the space in the global texture heaps is freed when the context is > > destroyed. This is the way that the Radeon and MGA drivers work. The r128 > > driver follows the same model so it is /intended/ to work the same. > > It looks to me like this is a bug in all the drivers. Try grepping for > 'in_use' in the Mesa drivers. I don't see anywhere in _any_ of the > drivers where a context sets in_use to zero in a texture region in the > global heap. Sorry to come to this late. The 'in_use' flag isn't used, you are right. But it's not really needed, either. Basically the key is that a driver must always be able to cope with having all of it's textures ripped out from under it (in the current scheme). When it grabs the lock and finds it has been contended, it looks at the global lru to see if any or all of it's textures have been clobbered. This is basically a refinement of the tdfx situation, where one flag indicates that all of the textures are gone. Keith |
From: Keith W. <ke...@tu...> - 2002-05-13 20:19:55
|
Leif Delgass wrote: > > On Mon, 13 May 2002, Ian Romanick wrote: > > > On Sun, May 12, 2002 at 06:18:58PM -0400, Leif Delgass wrote: > > > > > In working on AGP texturing for mach64, I'm starting from the Rage128 > > > code, which seems to have some problems (though the texture aging problem > > > could affect other drivers). My understanding is that textures in the > > > global LRU are marked as "used" and aged so that placeholders can be > > > inserted in a context's local LRU when another context steals its texture > > > memory. The problem is that nowhere are these texture regions released by > > > the context using them. The global LRU is only reset when the heap is > > > full. So the heap has to fill up before placeholders begin to get swapped > > > out. I've seen this when running multiple contexts at once, or repeatedly > > > starting, stopping, and restarting a single app. This isn't a huge > > > problem with a single heap, but with an AGP heap it means that card memory > > > is effectively leaked. Once the card memory global LRU is nearly filled > > > in the sarea with regions marked as "used", newly started apps will start > > > out only using AGP mem (with the r128 algorithm). Only if the app uses > > > enough mem. to fill AGP will it start to swap out the placeholders from > > > the local LRU and use card memory. > > > > If this is true, then I believe it is a bug in the r128 driver. IIRC, all > > of the space in the global texture heaps is freed when the context is > > destroyed. This is the way that the Radeon and MGA drivers work. The r128 > > driver follows the same model so it is /intended/ to work the same. > > It looks to me like this is a bug in all the drivers. Try grepping for > 'in_use' in the Mesa drivers. I don't see anywhere in _any_ of the > drivers where a context sets in_use to zero in a texture region in the > global heap. The local heap is destroyed when the context is destroyed, > but it doesn't touch the global heap. There are only two places where the > global LRU is modified. One is in UpdateTexLRU where the region is marked > as in_use, aged, and moved to the head of the list. The second is in > ResetGlobalLRU, where the list is (re)built and ages are reset. The Reset > function is only called from AgeTextures when the heap is full (which it > can detect because the Reset function leaves out one region when > rebuilding the list), which also forces TexturesGone to swap out > everything in the heap, including all of the placeholders. To see this > happening, you can enable the Print[Global,Local]LRU functions in > UpdateTexLRU, restart X, then repeatedly start, stop and restart the > texenv Mesa demo. Each time you restart, you'll see a new placeholder in > the local LRU if the problem is there. If I'm right, this should be > happening in all the drivers. Can someone test this on the Radeon driver? > > > > One possible solution I'm playing with would be to use a context > > > identifier on texture regions in the global LRU rather than a boolean > > > "in_use" (similar to the ctxOwner identifier used for marking the last > > > owner of the sarea's state information). Then when a context swaps out or > > > destroys textures, it can free regions that it owns from the global LRU > > > and age them so that other contexts will swap out their corresponding > > > placeholders. The downside is an increased penalty for swapping textures. > > > Another problem is how to reclaim "leaked" regions when an app doesn't > > > exit normally? > > > > This is not needed. When texture space is freed in the global heap by one > > context, the other contexts will see that the state of those blocks in the > > global heap has changed from owned to free. > > This is assuming that the space is actually freed. Maybe it wouldn't be > necessary if you only swap or destroy textures when holding the lock. Is > DestroyContext called while holding the lock? Also note that the space can be taken even if it isn't freed, for the reasons I described in the previous email. Keith |
From: Leif D. <lde...@re...> - 2002-05-13 20:57:54
|
On Mon, 13 May 2002, Keith Whitwell wrote: > Leif Delgass wrote: > > > > On Mon, 13 May 2002, Ian Romanick wrote: > > > > > On Sun, May 12, 2002 at 06:18:58PM -0400, Leif Delgass wrote: > > > > > > > In working on AGP texturing for mach64, I'm starting from the Rage128 > > > > code, which seems to have some problems (though the texture aging problem > > > > could affect other drivers). My understanding is that textures in the > > > > global LRU are marked as "used" and aged so that placeholders can be > > > > inserted in a context's local LRU when another context steals its texture > > > > memory. The problem is that nowhere are these texture regions released by > > > > the context using them. The global LRU is only reset when the heap is > > > > full. So the heap has to fill up before placeholders begin to get swapped > > > > out. I've seen this when running multiple contexts at once, or repeatedly > > > > starting, stopping, and restarting a single app. This isn't a huge > > > > problem with a single heap, but with an AGP heap it means that card memory > > > > is effectively leaked. Once the card memory global LRU is nearly filled > > > > in the sarea with regions marked as "used", newly started apps will start > > > > out only using AGP mem (with the r128 algorithm). Only if the app uses > > > > enough mem. to fill AGP will it start to swap out the placeholders from > > > > the local LRU and use card memory. > > > > > > If this is true, then I believe it is a bug in the r128 driver. IIRC, all > > > of the space in the global texture heaps is freed when the context is > > > destroyed. This is the way that the Radeon and MGA drivers work. The r128 > > > driver follows the same model so it is /intended/ to work the same. > > > > It looks to me like this is a bug in all the drivers. Try grepping for > > 'in_use' in the Mesa drivers. I don't see anywhere in _any_ of the > > drivers where a context sets in_use to zero in a texture region in the > > global heap. The local heap is destroyed when the context is destroyed, > > but it doesn't touch the global heap. There are only two places where the > > global LRU is modified. One is in UpdateTexLRU where the region is marked > > as in_use, aged, and moved to the head of the list. The second is in > > ResetGlobalLRU, where the list is (re)built and ages are reset. The Reset > > function is only called from AgeTextures when the heap is full (which it > > can detect because the Reset function leaves out one region when > > rebuilding the list), which also forces TexturesGone to swap out > > everything in the heap, including all of the placeholders. To see this > > happening, you can enable the Print[Global,Local]LRU functions in > > UpdateTexLRU, restart X, then repeatedly start, stop and restart the > > texenv Mesa demo. Each time you restart, you'll see a new placeholder in > > the local LRU if the problem is there. If I'm right, this should be > > happening in all the drivers. Can someone test this on the Radeon driver? > > > > > > One possible solution I'm playing with would be to use a context > > > > identifier on texture regions in the global LRU rather than a boolean > > > > "in_use" (similar to the ctxOwner identifier used for marking the last > > > > owner of the sarea's state information). Then when a context swaps out or > > > > destroys textures, it can free regions that it owns from the global LRU > > > > and age them so that other contexts will swap out their corresponding > > > > placeholders. The downside is an increased penalty for swapping textures. > > > > Another problem is how to reclaim "leaked" regions when an app doesn't > > > > exit normally? > > > > > > This is not needed. When texture space is freed in the global heap by one > > > context, the other contexts will see that the state of those blocks in the > > > global heap has changed from owned to free. > > > > This is assuming that the space is actually freed. Maybe it wouldn't be > > necessary if you only swap or destroy textures when holding the lock. Is > > DestroyContext called while holding the lock? > > Also note that the space can be taken even if it isn't freed, for the reasons > I described in the previous email. > I understand that, but my issue was that it's only freed when the heap fills up. If the global LRU card heap is full except for the last region, but the texture we're uploading is larger than that, we could end up switching to AGP memory, even if there was space no longer in use by another context in card memory. Depending on the size of the AGP aperture and the texture requirements of the app, you might end up never freeing the card memory until an app runs that has a texture that can go into the last region. If you know that the space is free as soon as the other context stops using it, there would be less potential for wasted card memory. -- Leif Delgass http://www.retinalburn.net |
From: Keith W. <ke...@tu...> - 2002-05-13 21:03:22
|
> > I understand that, but my issue was that it's only freed when the heap > fills up. If the global LRU card heap is full except for the last region, > but the texture we're uploading is larger than that, we could end up > switching to AGP memory, even if there was space no longer in use by > another context in card memory. Depending on the size of the AGP aperture > and the texture requirements of the app, you might end up never freeing > the card memory until an app runs that has a texture that can go into the > last region. If you know that the space is free as soon as the other > context stops using it, there would be less potential for wasted card > memory. Right - that would probably require some changes then. You also have to cope with the case where clients are killed rather than exit cleanly, although I guess it doesn't matter too much as the behaviour in that case will just be suboptimal. Keith |
From: Jens O. <je...@tu...> - 2002-05-14 13:03:57
|
Keith Whitwell wrote: > You also have to cope with the case where clients are killed rather than exit > cleanly, although I guess it doesn't matter too much as the behaviour in that > case will just be suboptimal. Is this something the Server should clean up? We do that for 3D contexts, drawables and full screen functionality... -- /\ Jens Owen / \/\ _ je...@tu... / \ \ \ Steamboat Springs, Colorado |
From: Keith W. <ke...@tu...> - 2002-05-14 14:06:41
|
Jens Owen wrote: > > Keith Whitwell wrote: > > > You also have to cope with the case where clients are killed rather than exit > > cleanly, although I guess it doesn't matter too much as the behaviour in that > > case will just be suboptimal. > > Is this something the Server should clean up? We do that for 3D > contexts, drawables and full screen functionality... I think a closer analogy would be dma buffers, which the kernel cleans up. Keith |
From: Ian R. <id...@us...> - 2002-05-13 23:53:57
|
On Mon, May 13, 2002 at 04:06:53PM -0400, Leif Delgass wrote: > On Mon, 13 May 2002, Ian Romanick wrote: > > > On Sun, May 12, 2002 at 06:18:58PM -0400, Leif Delgass wrote: > > > > > In working on AGP texturing for mach64, I'm starting from the Rage128 > > > code, which seems to have some problems (though the texture aging problem > > > could affect other drivers). My understanding is that textures in the > > > global LRU are marked as "used" and aged so that placeholders can be > > > inserted in a context's local LRU when another context steals its texture > > > memory. The problem is that nowhere are these texture regions released by > > > the context using them. The global LRU is only reset when the heap is > > > full. So the heap has to fill up before placeholders begin to get swapped > > > out. I've seen this when running multiple contexts at once, or repeatedly > > > starting, stopping, and restarting a single app. This isn't a huge > > > problem with a single heap, but with an AGP heap it means that card memory > > > is effectively leaked. Once the card memory global LRU is nearly filled > > > in the sarea with regions marked as "used", newly started apps will start > > > out only using AGP mem (with the r128 algorithm). Only if the app uses > > > enough mem. to fill AGP will it start to swap out the placeholders from > > > the local LRU and use card memory. > > > > If this is true, then I believe it is a bug in the r128 driver. IIRC, all > > of the space in the global texture heaps is freed when the context is > > destroyed. This is the way that the Radeon and MGA drivers work. The r128 > > driver follows the same model so it is /intended/ to work the same. > > It looks to me like this is a bug in all the drivers. Try grepping for > 'in_use' in the Mesa drivers. I don't see anywhere in _any_ of the > drivers where a context sets in_use to zero in a texture region in the > global heap. Ah yes, in_use. That field is defunct. Texture regions are stolen on a pure priority basis. No distinction is made between regions owned by the current context or other contexts. > The local heap is destroyed when the context is destroyed, > but it doesn't touch the global heap. There are only two places where the > global LRU is modified. One is in UpdateTexLRU where the region is marked > as in_use, aged, and moved to the head of the list. The second is in > ResetGlobalLRU, where the list is (re)built and ages are reset. The Reset > function is only called from AgeTextures when the heap is full (which it > can detect because the Reset function leaves out one region when > rebuilding the list), which also forces TexturesGone to swap out > everything in the heap, including all of the placeholders. Hrm...ok. I'll look into this a bit tomorrow and get back to you. Your reasoning seems good, but for some reasons it doesn't grok with what I remember. It's been awhile since I've looked at this code, so my memory might not be so good. :) > > > One possible solution I'm playing with would be to use a context > > > identifier on texture regions in the global LRU rather than a boolean > > > "in_use" (similar to the ctxOwner identifier used for marking the last > > > owner of the sarea's state information). Then when a context swaps out or > > > destroys textures, it can free regions that it owns from the global LRU > > > and age them so that other contexts will swap out their corresponding > > > placeholders. The downside is an increased penalty for swapping textures. > > > Another problem is how to reclaim "leaked" regions when an app doesn't > > > exit normally? > > > > This is not needed. When texture space is freed in the global heap by one > > context, the other contexts will see that the state of those blocks in the > > global heap has changed from owned to free. > > This is assuming that the space is actually freed. Maybe it wouldn't be > necessary if you only swap or destroy textures when holding the lock. Is > DestroyContext called while holding the lock? The DestroyContext function in the driver is. It mucks around with the SAREA, so it had better be! :) [snip] > I understand the algorithm, that's not the problem -- it's the _second_ if > block. My point is that if, at the beginning of the function, the texture > is already allocated on the AGP heap (t->heap == t->memBlock->heap == AGP) > and is small enough to fit card memory, t->heap is changed to the CARD > heap unconditionally by the first line. However, since the texture > already has a memBlock in AGP, the second if block is not entered and > memBlock remains on the AGP heap, so t->heap == CARD, but > t->memBlock->heap == AGP. I'm saying that t->heap should only be changed > to the CARD heap if it doesn't already have a memBlock (i.e. it's new or > has been swapped out). Right. Currently, there is no promotion of textures from AGP to on-card memory. This is one piece of work that, after getting my initial patches in (sigh...), I had intended to do. -- Tell that to the Marines! |