From: Leif D. <lde...@re...> - 2003-01-31 21:29:37
|
I've been looking at the texmem branch and have found a few problems. I wanted to explain the problems I found and get some feedback before commiting any changes or putting a patch together. The first item is the cause of the texture corruption with multiple contexts that I have seen with both Radeon and Rage 128. In the DRI_AGE_TEXTURES macro in texmem.h, the age test is reversed: #define DRI_AGE_TEXTURES( heap ) \ do { \ if ( ((heap) != NULL) \ && ((heap)->local_age > (heap)->global_age[0]) ) \ driAgeTextures( heap ); \ } while( 0 ) Here global_age[0] would be _larger_ than local_age after a context's textures have been kicked out (global_age is incremented at each upload). Changing the test to != or < fixes the problem. However, there is another problem here. In the old driver-specific code, the local age was a signed int, which was initialized to -1, and the global age was/is initialized to 0 (by memset-ing the SAREA). This caused the first context in a new server generation to initialize the global LRU since the local and global ages differed. With both now being initially 0, the global LRU is not initialized before it is updated when the first texture upload happens. One workaround would be to use local != global in the age test and initialize the local age to anything > 0 if the global age is 0 when the context is created. Also, I think it would cause less confusion if the drivers' SAREAs used unsigned ints where the pointers in the heap/region structs in texmem.h do. It should be safe to assume sizeof(int) == sizeof(unsigned int), right? Actually, even better would be if all the drivers used the drmTextureRegion struct (mga does) defined in xf86drm.h, since the code to manipulate the struct is now being made common to all drivers. The difference between that struct and the current driver private versions is that drmTextureRegion uses an explicit padding byte between the first three bytes (unsigned chars) and the age (unsigned int). Would there be compatibility problems in moving from a struct without the explicit padding to one with it? : typedef struct _drmTextureRegion { unsigned char next; unsigned char prev; unsigned char in_use; unsigned char padding; /* Explicitly pad this out */ unsigned int age; } drmTextureRegion, *drmTextureRegionPtr; vs. typedef struct { unsigned char next, prev; /* indices to form a circular LRU */ unsigned char in_use; /* owned by a client, or free? */ int age; /* tracked by clients to update local LRU's */ } radeon_tex_region_t; There are also a few failing assertions related to placeholder texture objects. In driTexturesGone, we need to set t->heap = heap or else the assertion that t->heap != NULL fails in driDestroyTextureObject when destroying a placeholder. The other assertions that I don't think are valid are in the drivers' DestroyContext, where it's assumed that the swapped list and texture object lists are empty. Even if Mesa calls the driver's DeleteTexture for all the real texture objects at context teardown (does this happen?), there may still be placeholder objects on the swapped or texture_objects lists. I think it is safer to have driDestroyTextureHeap iterate both lists and destroy any remaining texture objects and remove the assertions. The last problem I found is the drivers' call to driCreateTextureHeap in CreateContext. Passing pointers to the texList and texAge arrays without an index results in texList[0] and texAge[0] being passed for both texture heaps (if the driver supports the AGP heap), so those should be indexed as texList[i] and texAge[i] where i is the heapId. Also, I think there's a small optimization we could make, but I need to make sure this is valid. I observed two contexts (texobj and tunnel demos) repeatedly deleting and creating the same placeholder in driTexturesGone when switching contexts. I think it would be possible to keep an existing placeholder if the offset and size of the placeholder in the texture_objects list matches the global region which has been updated. For debugging the LRUs, I added the printLocal/GlobalLRU functions from the old driver code to texmem.c in my tree. I think it's useful to have these at least as static functions to use for debugging the common texmem code. At any rate, I can put a patch together for review, but I wanted to see if there's anything I'm missing here. -- Leif Delgass http://www.retinalburn.net |
From: Ian R. <id...@us...> - 2003-01-31 23:40:18
|
Leif Delgass wrote: > I've been looking at the texmem branch and have found a few problems. I > wanted to explain the problems I found and get some feedback before > commiting any changes or putting a patch together. Thanks for looking over the code. Everyone's been so busy lately that I don't think many people have had much of a chance to delve into it much. > The first item is the cause of the texture corruption with multiple > contexts that I have seen with both Radeon and Rage 128. In the > DRI_AGE_TEXTURES macro in texmem.h, the age test is reversed: > > #define DRI_AGE_TEXTURES( heap ) \ > do { \ > if ( ((heap) != NULL) \ > && ((heap)->local_age > (heap)->global_age[0]) ) \ > driAgeTextures( heap ); \ > } while( 0 ) > > Here global_age[0] would be _larger_ than local_age after a context's > textures have been kicked out (global_age is incremented at each upload). > Changing the test to != or < fixes the problem. However, there is another > problem here. In the old driver-specific code, the local age was a signed > int, which was initialized to -1, and the global age was/is initialized to > 0 (by memset-ing the SAREA). This caused the first context in a new > server generation to initialize the global LRU since the local and global > ages differed. With both now being initially 0, the global LRU is not > initialized before it is updated when the first texture upload happens. > One workaround would be to use local != global in the age test and > initialize the local age to anything > 0 if the global age is 0 when the > context is created. I believe your analysis is essentially correct. Interestingly, the test in driAgeTextures IS correct. As far as initializing the global LRU goes, when there are no textures allocated, there is nothing in the LRU. > Also, I think it would cause less confusion if the drivers' SAREAs used > unsigned ints where the pointers in the heap/region structs in texmem.h > do. It should be safe to assume sizeof(int) == sizeof(unsigned int), > right? Actually, even better would be if all the drivers used the > drmTextureRegion struct (mga does) defined in xf86drm.h, since the code to > manipulate the struct is now being made common to all drivers. The > difference between that struct and the current driver private versions is > that drmTextureRegion uses an explicit padding byte between the first > three bytes (unsigned chars) and the age (unsigned int). Would there be > compatibility problems in moving from a struct without the explicit > padding to one with it? : > > typedef struct _drmTextureRegion { > unsigned char next; > unsigned char prev; > unsigned char in_use; > unsigned char padding; /* Explicitly pad this out */ > unsigned int age; > } drmTextureRegion, *drmTextureRegionPtr; > > vs. > > typedef struct { > unsigned char next, prev; /* indices to form a circular LRU */ > unsigned char in_use; /* owned by a client, or free? */ > int age; /* tracked by clients to update local LRU's */ > } radeon_tex_region_t; I cannot think of a case on any of the supported architectures where sizeof( unsigned int ) != sizeof( int ). I'm not sure that ANSI C99 permits such an implementation. As far as replacing the device specific *_tex_region_t structures with drmTextureRegion, that would probably be a good idea. In fact, dri_texture_region (in texmem.h) should be replaced as well. I didn't notice drmTextureRegion when I was doing that part of the work (about 10 months ago). > There are also a few failing assertions related to placeholder texture > objects. In driTexturesGone, we need to set t->heap = heap or else the > assertion that t->heap != NULL fails in driDestroyTextureObject when > destroying a placeholder. The other assertions that I don't think are How could a texture get into a heap's list that was not allocated from that heap? If such a case exists, it is a bug. > valid are in the drivers' DestroyContext, where it's assumed that the > swapped list and texture object lists are empty. Even if Mesa calls the > driver's DeleteTexture for all the real texture objects at context > teardown (does this happen?), there may still be placeholder objects on > the swapped or texture_objects lists. I think it is safer to have > driDestroyTextureHeap iterate both lists and destroy any remaining texture > objects and remove the assertions. Mesa is supposed to destroy all the real textures before it gets to DestroyContext. The textures on the swapped list and on each heap's texture_objects list are textures that had actual texture data / memory associated with them. The place holder textures should NEVER be on any of these lists. > The last problem I found is the drivers' call to driCreateTextureHeap in > CreateContext. Passing pointers to the texList and texAge arrays without > an index results in texList[0] and texAge[0] being passed for both texture > heaps (if the driver supports the AGP heap), so those should be indexed as > texList[i] and texAge[i] where i is the heapId. I believe you are correct again. In fact, if all the various *_tex_region_t structures are replaced with drmTextureRegion and this problem is fixed, we can get rid of the ugly type-cast in this call. > Also, I think there's a small optimization we could make, but I need to > make sure this is valid. I observed two contexts (texobj and tunnel > demos) repeatedly deleting and creating the same placeholder in > driTexturesGone when switching contexts. I think it would be possible to > keep an existing placeholder if the offset and size of the placeholder in > the texture_objects list matches the global region which has been updated. > For debugging the LRUs, I added the printLocal/GlobalLRU functions from > the old driver code to texmem.c in my tree. I think it's useful to have > these at least as static functions to use for debugging the common texmem > code. That optimization sounds fine. Adding the debug functions back in texmem.c is not a bad idea. Just make sure that we don't get any compiler warnings when they aren't called (i.e., static function not used, etc.). :) > At any rate, I can put a patch together for review, but I wanted to see if > there's anything I'm missing here. I'll look at it when you send it. It sounds like you've done some good work. :) How is the r128 driver in texmem-0-0-1 working these days? |
From: Ian R. <id...@us...> - 2003-02-01 00:05:50
|
Leif Delgass wrote: > There are also a few failing assertions related to placeholder texture > objects. In driTexturesGone, we need to set t->heap = heap or else the > assertion that t->heap != NULL fails in driDestroyTextureObject when > destroying a placeholder. I see the problem you're talking about here. You are correct. I misunderstood where you meant to set t->heap to heap. I now see that you meant to set it after the CALLOC in the 'if ( in_use )' block. Duh. :) |
From: Leif D. <lde...@re...> - 2003-02-01 00:32:00
|
On Fri, 31 Jan 2003, Ian Romanick wrote: > Leif Delgass wrote: > > > There are also a few failing assertions related to placeholder texture > > objects. In driTexturesGone, we need to set t->heap = heap or else the > > assertion that t->heap != NULL fails in driDestroyTextureObject when > > destroying a placeholder. > > I see the problem you're talking about here. You are correct. I > misunderstood where you meant to set t->heap to heap. I now see that > you meant to set it after the CALLOC in the 'if ( in_use )' block. Duh. :) Right. I should have been more specific there (the actual patch would have been more clear ;) ). Do you see what I mean now about the assertions on tearing down the context? A placeholder _does_ have a memBlock, which matches the corresponding global region marked as in use. It's in the same block you refer to here that it's added to the texture_objects list. Also, a placeholder can be moved to the swapped_objects list in driAllocateTexture (another place an assertion can fail in driSwapOutTextureObject if the placeholder's t->heap == NULL). The driver's DeleteTexture callbacks on teardown won't remove any placeholders from these lists, just the "real" textures that have corresponding gl_texture_object structs (t->tObj). -- Leif Delgass http://www.retinalburn.net |
From: Ian R. <id...@us...> - 2003-02-01 01:58:59
|
Leif Delgass wrote: > On Fri, 31 Jan 2003, Ian Romanick wrote: > >>Leif Delgass wrote: >> >>>There are also a few failing assertions related to placeholder texture >>>objects. In driTexturesGone, we need to set t->heap = heap or else the >>>assertion that t->heap != NULL fails in driDestroyTextureObject when >>>destroying a placeholder. >> >>I see the problem you're talking about here. You are correct. I >>misunderstood where you meant to set t->heap to heap. I now see that >>you meant to set it after the CALLOC in the 'if ( in_use )' block. Duh. :) > > Right. I should have been more specific there (the actual patch would > have been more clear ;) ). Do you see what I mean now about the Yeah. You know what they say...a patch is worth 2^10 words. :) > assertions on tearing down the context? A placeholder _does_ have a > memBlock, which matches the corresponding global region marked as in use. > It's in the same block you refer to here that it's added to the > texture_objects list. Also, a placeholder can be moved to the > swapped_objects list in driAllocateTexture (another place an assertion can > fail in driSwapOutTextureObject if the placeholder's t->heap == NULL). > The driver's DeleteTexture callbacks on teardown won't remove any > placeholders from these lists, just the "real" textures that have > corresponding gl_texture_object structs (t->tObj). Part of the confusion was over "placeholder." There is a dummy object bound to each texture target (see driInitTextureObjects in texmem.c, line ~908). When you said placeholder, this is the the object that my brain thought of. You are correct. The objects that are actually called placeholders can be on the the texture_objects list, but I don't think it should ever be on the swapped_objects list. It seems illogical that one process would track the textures in another process that were kicked out. That's just silly. That said, it does seem that this will happen. It seems like this could be a memory leak. The placeholder textures on the swapped_objects list will just sit there until the context is destroyed. I think we need to add a flag to dri_texture_object that says "this is a placeholder." If an object is a placeholder, it does not get put on the swapped_objects lists. Instead, it gets destroyed. That flag could also be used to help implement the optimization you were talking about in an earlier message. After looking at all the code, I couldn't understand why none of those assertions were never triggered. Here's the coup de grace: radeonDestroyContext never gets called. I tried a bunch of different things, but if I set a breakpoint after hitting the _mesa_test_os_sse_exception_support signal, my breakpoint never gets hit. I'm going to have to investigate this some more... |
From: Leif D. <lde...@re...> - 2003-02-01 22:07:45
Attachments:
texmem.diff
|
Attached is a patch to the texmem branch with the changes we've been discussing. I'd still like to take a closer look at the global heap, but I've included the code to force initializing the global heap on the first lock contention for now. I've changed the drm/ddx/texmem code to use the drmTextureRegion/drm_tex_region_t struct for the SAREA and heap structures, but only for the drivers converted to the texmem interface. I also added a test to driAllocateTexture that deletes a placeholder (using t->tObj == NULL to detect a placeholder) instead of swapping it out. An explicit flag for detecting a placeholder might make the code more clear, but using t->tObj should work too. Since placeholders should never be on the swapped list now, I left in the assert for the swapped list being empty in the drivers' DestroyContext, though I added code to delete anything on the swapped list in driDestroyTextureHeap just in case that assertion would need to be removed for some reason. I've also included the optimization in driTexturesGone to keep an existing placeholder if the global region is in use and matches the size and offset of the existing placeholder. It this looks OK, I will apply it to the branch. I need to do some more testing on the Rage 128 driver, but so far things seem to be working except for the problem in the fire Mesa demo, which is still there. It kind of looks like a texture coordinate problem, but I'm not sure yet. -- Leif Delgass http://www.retinalburn.net |
From: Ian R. <id...@us...> - 2003-02-03 19:36:56
|
Leif Delgass wrote: > Attached is a patch to the texmem branch with the changes we've been > discussing. I'd still like to take a closer look at the global heap, but > I've included the code to force initializing the global heap on the first > lock contention for now. I've changed the drm/ddx/texmem code to use the > drmTextureRegion/drm_tex_region_t struct for the SAREA and heap > structures, but only for the drivers converted to the texmem interface. I > also added a test to driAllocateTexture that deletes a placeholder (using > t->tObj == NULL to detect a placeholder) instead of swapping it out. An > explicit flag for detecting a placeholder might make the code more clear, > but using t->tObj should work too. Since placeholders should never be on > the swapped list now, I left in the assert for the swapped list being > empty in the drivers' DestroyContext, though I added code to delete > anything on the swapped list in driDestroyTextureHeap just in case that > assertion would need to be removed for some reason. > > I've also included the optimization in driTexturesGone to keep an existing > placeholder if the global region is in use and matches the size and offset > of the existing placeholder. > > It this looks OK, I will apply it to the branch. I gave the patch a quick once-over, and it looks good so far. I'm going to study it a bit more carefully before the #dri-devel meeting. I do have a couple comments so far, but their pretty minor. > I need to do some more testing on the Rage 128 driver, but so far things > seem to be working except for the problem in the fire Mesa demo, which is > still there. It kind of looks like a texture coordinate problem, but I'm > not sure yet. Does r128 exhibit this problem on the trunk? |
From: Ian R. <id...@us...> - 2003-02-03 20:25:01
|
Leif Delgass wrote: > It this looks OK, I will apply it to the branch. I've poked around with the patch, and it looks good. There are only two things that I would change. There should probably be a comment in texmem.h that explains the double-duty that tObj does. I think the meaning of t->tObj == NULL is important to document. I would also probably change printLocalLRU and printGlobalLRU to take a pointer to __FUNCTION__ and log that in their printfs. I remember making that change to the version in the r100 driver when I was first mucking with the texmem stuff. That proved very useful when I was trying to figure out how it all worked. :) Nice work! |
From: Leif D. <lde...@re...> - 2003-02-04 16:57:15
Attachments:
texmem-20030204.diff
|
Attached is the patch with the modifications you suggested. --Leif On Mon, 3 Feb 2003, Ian Romanick wrote: > Leif Delgass wrote: > > > It this looks OK, I will apply it to the branch. > > I've poked around with the patch, and it looks good. There are only two > things that I would change. There should probably be a comment in > texmem.h that explains the double-duty that tObj does. I think the > meaning of t->tObj == NULL is important to document. > > I would also probably change printLocalLRU and printGlobalLRU to take a > pointer to __FUNCTION__ and log that in their printfs. I remember > making that change to the version in the r100 driver when I was first > mucking with the texmem stuff. That proved very useful when I was > trying to figure out how it all worked. :) > > Nice work! > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Dri-devel mailing list > Dri...@li... > https://lists.sourceforge.net/lists/listinfo/dri-devel > -- Leif Delgass http://www.retinalburn.net |