You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
(10) |
Apr
(28) |
May
(41) |
Jun
(91) |
Jul
(63) |
Aug
(45) |
Sep
(37) |
Oct
(80) |
Nov
(91) |
Dec
(47) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(48) |
Feb
(121) |
Mar
(126) |
Apr
(16) |
May
(85) |
Jun
(84) |
Jul
(115) |
Aug
(71) |
Sep
(27) |
Oct
(33) |
Nov
(15) |
Dec
(71) |
2002 |
Jan
(73) |
Feb
(34) |
Mar
(39) |
Apr
(135) |
May
(59) |
Jun
(116) |
Jul
(93) |
Aug
(40) |
Sep
(50) |
Oct
(87) |
Nov
(90) |
Dec
(32) |
2003 |
Jan
(181) |
Feb
(101) |
Mar
(231) |
Apr
(240) |
May
(148) |
Jun
(228) |
Jul
(156) |
Aug
(49) |
Sep
(173) |
Oct
(169) |
Nov
(137) |
Dec
(163) |
2004 |
Jan
(243) |
Feb
(141) |
Mar
(183) |
Apr
(364) |
May
(369) |
Jun
(251) |
Jul
(194) |
Aug
(140) |
Sep
(154) |
Oct
(167) |
Nov
(86) |
Dec
(109) |
2005 |
Jan
(176) |
Feb
(140) |
Mar
(112) |
Apr
(158) |
May
(140) |
Jun
(201) |
Jul
(123) |
Aug
(196) |
Sep
(143) |
Oct
(165) |
Nov
(158) |
Dec
(79) |
2006 |
Jan
(90) |
Feb
(156) |
Mar
(125) |
Apr
(146) |
May
(169) |
Jun
(146) |
Jul
(150) |
Aug
(176) |
Sep
(156) |
Oct
(237) |
Nov
(179) |
Dec
(140) |
2007 |
Jan
(144) |
Feb
(116) |
Mar
(261) |
Apr
(279) |
May
(222) |
Jun
(103) |
Jul
(237) |
Aug
(191) |
Sep
(113) |
Oct
(129) |
Nov
(141) |
Dec
(165) |
2008 |
Jan
(152) |
Feb
(195) |
Mar
(242) |
Apr
(146) |
May
(151) |
Jun
(172) |
Jul
(123) |
Aug
(195) |
Sep
(195) |
Oct
(138) |
Nov
(183) |
Dec
(125) |
2009 |
Jan
(268) |
Feb
(281) |
Mar
(295) |
Apr
(293) |
May
(273) |
Jun
(265) |
Jul
(406) |
Aug
(679) |
Sep
(434) |
Oct
(357) |
Nov
(306) |
Dec
(478) |
2010 |
Jan
(856) |
Feb
(668) |
Mar
(927) |
Apr
(269) |
May
(12) |
Jun
(13) |
Jul
(6) |
Aug
(8) |
Sep
(23) |
Oct
(4) |
Nov
(8) |
Dec
(11) |
2011 |
Jan
(4) |
Feb
(2) |
Mar
(3) |
Apr
(9) |
May
(6) |
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
(1) |
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Gareth H. <ga...@va...> - 2000-10-16 23:18:13
|
Josh Vanderhoof wrote: > > It looks like gcc is trying to keep the values as ubytes for too long. > If you change r,g,b,a from GLubyte to GLuint, it seems to to a better > job. Yes, I had noticed a significant speedup by doing this. The final version of the templated code wasn't going to be this exact code. However, I was more interested in the fact that gcc output vastly different assembly depending on where the C code was located, with such a huge difference in performance. I though I might share this with the list. -- Gareth |
From: Josh V. <ho...@na...> - 2000-10-16 18:57:49
|
Gareth Hughes <ga...@va...> writes: > The movzx instructions are very fast (they've been specially optimized > for doing mixed 8/16 and 32 bit operations on PPro/PII/PIII processors), > while the second listing has lots of partial register stalls which seem > to be killing the performance. > > I'm stunned and amazed. It looks like gcc is trying to keep the values as ubytes for too long. If you change r,g,b,a from GLubyte to GLuint, it seems to to a better job. With an old GCC 2.96 snapshot (20000529) and using uints instead of ubytes, you get this: .file "mesaprs.c" .version "01.01" gcc2_compiled.: .text .align 4 .globl _mesa_convert_teximage_argb_4444 .type _mesa_convert_teximage_argb_4444,@function _mesa_convert_teximage_argb_4444: pushl %ebp pushl %edi pushl %esi pushl %ebx subl $12, %esp movl 52(%esp), %eax sall $1, %eax movl $0, (%esp) movl 40(%esp), %edx movl %eax, 8(%esp) cmpl %edx, (%esp) movl 44(%esp), %eax movl 68(%esp), %esi movl %eax, 4(%esp) jae .L13 .p2align 2 .L6: xorl %ebp, %ebp xorl %edi, %edi cmpl 36(%esp), %ebp jae .L14 .p2align 2 .L10: movzbl (%edi,%esi), %edx movzbl 3(%edi,%esi), %eax andl $240, %eax andl $240, %edx movzbl 1(%edi,%esi), %ecx sall $4, %edx sall $8, %eax movzbl 2(%edi,%esi), %ebx orl %edx, %eax andl $240, %ecx orl %ecx, %eax shrl $4, %ebx orl %ebx, %eax movl 4(%esp), %edx movw %ax, (%edx,%ebp,2) incl %ebp addl $4, %edi cmpl 36(%esp), %ebp jb .L10 .L14: movl 48(%esp), %eax incl (%esp) movl 40(%esp), %edx addl 8(%esp), %esi addl %eax, 4(%esp) cmpl %edx, (%esp) jb .L6 .L13: addl $12, %esp popl %ebx popl %esi movl $1, %eax popl %edi popl %ebp ret .Lfe1: .size _mesa_convert_teximage_argb_4444,.Lfe1-_mesa_convert_teximage_argb_4444 .ident "GCC: (GNU) 2.96 20000529 (experimental)" |
From: <ad...@hu...> - 2000-10-16 17:22:23
|
Ok, this didn't get thru, so here it is (sorry for the confusion) Hi Gareth, would you mind taking a look at the attached file for a couple more ways of doing the texture depth conversions? I separated the main for loops and stuck them in a file for easier testing of the code. I hope that's not a problem? I am not sure which is the faster one as I don't have a decent profiling setup in Linux [any help on this would be appreciated as the last time i did serious code instruction profiling was back in high school with Abrash's self-adjusting code ex: i really like all the benchmarks that you give us on this list and the DRI list as far as improvments (or not) of various codes...] Also, i am not sure if the code is endian safe. I only thought of the Intel/little endian case. Also, can you remind me how to direct gcc to not use the stack but rather use the registers some more? thanks, tony |
From: <ad...@hu...> - 2000-10-16 16:43:40
|
i will just include the code here so that no conversions are necessary... #include <stdio.h> #define GLubyte char #define GLushort short #define GLint int int main(void){ char whatever1[]={"sakjdhsskhjasjashaskdsakjasjas"}; char whatever2[]={"lhlkgjlkajdsflkfjlkjklfdaskfaf"}; //const GLubyte *src=_mesa_image_address(packing, srcImage, srcWidth, // srcHeight,srcFormat,srcType); // GLubyte *src=(GLubyte *)malloc(1024); GLubyte *src=(GLubyte *)whatever1; // GLushort *dst=(GLushort *)dstImage; // GLushort *dst=(GLushort *)malloc(1024); GLushort *dst=(GLushort *)whatever2; int f; GLushort val; GLint val32; GLint row; // for(row=0; row < dstHeight; row++){ // GLint col, col4; GLint col=4, col4=4; //rig some values so we don't over/underflow // for(col=col4=0; col < dstWidth; col++, col4 +=4){ //original ///* GLubyte r = src[col4 + 0]; GLubyte g = src[col4 + 1]; GLubyte b = src[col4 + 2]; GLubyte a = src[col4 + 3]; //printf("Datazzzz: (lsb)RGBA(msb)= 0x%x%x%x%x \n",a,b,g,r); dst[col] = ((a & 0xf0) << 8) | ((r & 0xf0) << 4) | ((g & 0xf0) ) | ((b & 0xf0) >> 4); //printf("Method 0: 0x%x \n",dst[col]); //*/ f++; f++; f++; f++; f++; f++; f++; f++; f++; // segment the assembler code for easier reading :)) //new1 ///* *(dst+col)= \ (((*(src+col4+3))&0xf0)<<8) | \ (((*(src+col4+2))&0xf0)>>4) | \ (((*(src+col4+1))&0xf0) ) | \ (((*(src+col4+0))&0xf0)<<4); //printf("Method 1: 0x%x \n",dst[col]); //*/ f++; f++; f++; f++; f++; f++; f++; f++; f++; // segment the assembler code for easier reading :)) //new2 ///* src+=col4; val =((*src)&0xf0)<<4; src++; val|=((*src)&0xf0); src++; val|=((*src)&0xf0)>>4; src++; val|=((*src)&0xf0)<<8; *(dst+col)=val; //save converted thing //printf("Method 2: 0x%x \n",dst[col]); src-=col4+3; //re-adjust src, this can be skipped outside this forloop by adjusting srcStride... //*/ f++; f++; f++; f++; f++; f++; f++; f++; f++; // segment the assembler code for easier reading :)) //new3 ///* val32=*( (GLint*)(src+col)); *(dst+col)= \ (( val32 & 0xf0000000)>>16) | \ (( val32 & 0x00f00000)>>20) | \ (( val32 & 0x0000f000)>>8 ) | \ (( val32 & 0x000000f0)<<4); //printf("Method 3: 0x%x \n",dst[col]); //*/ // }//inner // src +=srcStride; // dst =(GLushort *) ((GLubyte*) dst+dstRowStride); // }//outter return 0; };//end main |
From: Gareth H. <ga...@va...> - 2000-10-16 11:53:13
|
I guess the C code might help, although not much... Original: GLboolean _mesa_convert_teximage(MesaIntTexFormat dstFormat, GLint dstWidth, GLint dstHeight, GLvoid *dstImage, GLint dstRowStride, GLint srcWidth, GLint srcHeight, GLenum srcFormat, GLenum srcType, const GLvoid *srcImage, const struct gl_pixelstore_attrib *packing) { const GLint wScale = dstWidth / srcWidth; /* must be power of two */ const GLint hScale = dstHeight / srcHeight; /* must be power of two */ ASSERT(dstWidth >= srcWidth); ASSERT(dstHeight >= srcHeight); ASSERT(dstImage); ASSERT(srcImage); ASSERT(packing); switch (dstFormat) { ... case MESA_A4_R4_G4_B4: /* store as 16-bit texels (GR_TEXFMT_ARGB_4444) */ if (srcFormat == GL_BGRA && srcType == GL_UNSIGNED_SHORT_4_4_4_4_REV){ ... } else if (srcFormat == GL_RGBA && srcType == GL_UNSIGNED_BYTE) { /* general case */ if (wScale == 1 && hScale == 1) { const GLubyte *src = _mesa_image_address(packing, srcImage, srcWidth, srcHeight, srcFormat, srcType, 0, 0, 0); const GLint srcStride = _mesa_image_row_stride(packing, srcWidth, srcFormat, srcType); GLushort *dst = (GLushort *) dstImage; GLint row; for (row = 0; row < dstHeight; row++) { GLint col, col4; for (col = col4 = 0; col < dstWidth; col++, col4 += 4) { GLubyte r = src[col4 + 0]; GLubyte g = src[col4 + 1]; GLubyte b = src[col4 + 2]; GLubyte a = src[col4 + 3]; dst[col] = ((a & 0xf0) << 8) | ((r & 0xf0) << 4) | ((g & 0xf0) ) | ((b & 0xf0) >> 4); } src += srcStride; dst = (GLushort *) ((GLubyte *) dst + dstRowStride); } } else { ... } } else { ... } break; ... } return GL_TRUE; } Special case: GLboolean _mesa_convert_teximage_argb_4444(MesaIntTexFormat dstFormat, GLint dstWidth, GLint dstHeight, GLvoid *dstImage, GLint dstRowStride, GLint srcWidth, GLint srcHeight, GLenum srcFormat, GLenum srcType, const GLvoid *srcImage, const struct gl_pixelstore_attrib *packing) { const GLubyte *src = srcImage; const GLint srcStride = srcWidth * 2; GLushort *dst = (GLushort *) dstImage; GLint row; for (row = 0; row < dstHeight; row++) { GLint col, col4; for (col = col4 = 0; col < dstWidth; col++, col4 += 4) { GLubyte r = src[col4 + 0]; GLubyte g = src[col4 + 1]; GLubyte b = src[col4 + 2]; GLubyte a = src[col4 + 3]; dst[col] = ((a & 0xf0) << 8) | ((r & 0xf0) << 4) | ((g & 0xf0) ) | ((b & 0xf0) >> 4); } src += srcStride; dst = (GLushort *) ((GLubyte *) dst + dstRowStride); } return GL_TRUE; } Figure that one out... |
From: Gareth H. <ga...@va...> - 2000-10-16 07:46:28
|
I've got a bunch of nice MMX code that I wanted to plug into the texutil framework, so all the hardware drivers could benefit from fast texture image conversions. I set about creating a function table that had specialized versions of the basic texture image conversion code, and at the basic level this table would be filled with functions made from splitting up _mesa_convert_teximage() into a whole bunch of smaller routines. This all applies to the subimage conversion code, but we'll use the main one as an example. This seemed like a great idea - do some basic tests, build up a function table index, and then call the specialized conversion routine using this index. We could then make sure the Q3/UT/<insert favourite app> texture conversions were super-fast. Anyway, I ripped the _mesa_convert_teximage() code apart and built up a whole bunch of separate functions, plugging these into the function table. I then ran a basic benchmark - converting a 32bpp RGBA texture to ARGB4444 format, used often in 16bpp rendering. I used the original _mesa_convert_teximage() code, as well as the mgaConvertTexture() as I'd noticed parts of this code were significantly faster than the original Mesa code. For this particular test, the MGA and Mesa code were about even, with the MGA ahead by perhaps a percent or two. To my great surprise, the ripped-apart function was 25% slower. I removed the function table and called the _mesa_convert_teximage_argb_4444 function directly, but it was still 25% slower. I copied the *exact* code that was being executed from the original Mesa code and plugged it into the separate function again. Still no good. I removed the _mesa_image_address and _mesa_image_row_stride calls (which is kinda the whole point of breaking it apart like this) and made the routine inline, and the best I could get it was around 10-15% slower. At this point, I was confused. Time to look at the compiler output. Sure enough, the exact same block of C code was producing vastly different assembly code. Here's the code from the original Mesa routine, which should be slower: 1008: 8b 55 b4 mov %edx,DWORD PTR [%ebp-76] 100b: 8b 7d a4 mov %edi,DWORD PTR [%ebp-92] 100e: 8a 04 17 mov %al,BYTE PTR [%edi+%edx] 1011: 24 f8 and %al,0xf8 1013: 88 85 f8 fe ff ff mov BYTE PTR [%ebp-264],%al 1019: 66 c1 e0 08 shl %ax,0x8 101d: 66 89 85 f4 fe ff ff mov DWORD PTR [%ebp-268],%ax 1024: 8a 16 mov %dl,BYTE PTR [%esi] 1026: 80 e2 fc and %dl,0xfc 1029: 66 0f b6 c2 movzx %ax,%dl 102d: 66 c1 e0 03 shl %ax,0x3 1031: 66 09 85 f4 fe ff ff or DWORD PTR [%ebp-268],%ax 1038: 8b bd f0 fe ff ff mov %edi,DWORD PTR [%ebp-272] 103e: 8a 17 mov %dl,BYTE PTR [%edi] 1040: c0 ea 03 shr %dl,0x3 1043: 66 0f b6 c2 movzx %ax,%dl 1047: 8b 95 f4 fe ff ff mov %edx,DWORD PTR [%ebp-268] 104d: 09 c2 or %edx,%eax 104f: 66 89 11 mov DWORD PTR [%ecx],%dx 1052: 83 c1 02 add %ecx,2 1055: 43 inc %ebx 1056: 83 c7 04 add %edi,4 1059: 89 bd f0 fe ff ff mov DWORD PTR [%ebp-272],%edi 105f: 83 c6 04 add %esi,4 1062: 83 45 a4 04 add DWORD PTR [%ebp-92],4 1066: 3b 5d 0c cmp %ebx,DWORD PTR [%ebp+12] 1069: 7c 9d jl 1008 And here's the code from the separate routine, which should be faster: 2392: 8b 75 e8 mov %esi,DWORD PTR [%ebp-24] 2395: 8a 06 mov %al,BYTE PTR [%esi] 2397: 24 f0 and %al,0xf0 2399: 89 c2 mov %edx,%eax 239b: 66 c1 e2 08 shl %dx,0x8 239f: 8b 4d fc mov %ecx,DWORD PTR [%ebp-4] 23a2: 8b 75 ec mov %esi,DWORD PTR [%ebp-20] 23a5: 8a 04 0e mov %al,BYTE PTR [%esi+%ecx] 23a8: 24 f0 and %al,0xf0 23aa: 25 ff 00 00 00 and %eax,0xff 23af: 66 c1 e0 04 shl %ax,0x4 23b3: 09 c2 or %edx,%eax 23b5: 8b 4d e0 mov %ecx,DWORD PTR [%ebp-32] 23b8: 8a 01 mov %al,BYTE PTR [%ecx] 23ba: 24 f0 and %al,0xf0 23bc: 25 ff 00 00 00 and %eax,0xff 23c1: 09 c2 or %edx,%eax 23c3: 8a 07 mov %al,BYTE PTR [%edi] 23c5: c0 e8 04 shr %al,0x4 23c8: 25 ff 00 00 00 and %eax,0xff 23cd: 09 c2 or %edx,%eax 23cf: 8b 75 e4 mov %esi,DWORD PTR [%ebp-28] 23d2: 66 89 16 mov DWORD PTR [%esi],%dx 23d5: 83 c6 02 add %esi,2 23d8: 89 75 e4 mov DWORD PTR [%ebp-28],%esi 23db: 43 inc %ebx 23dc: 83 45 e8 04 add DWORD PTR [%ebp-24],4 23e0: 83 c7 04 add %edi,4 23e3: 83 c1 04 add %ecx,4 23e6: 89 4d e0 mov DWORD PTR [%ebp-32],%ecx 23e9: 83 45 ec 04 add DWORD PTR [%ebp-20],4 23ed: 3b 5d 0c cmp %ebx,DWORD PTR [%ebp+12] 23f0: 7c a0 jl 2392 The movzx instructions are very fast (they've been specially optimized for doing mixed 8/16 and 32 bit operations on PPro/PII/PIII processors), while the second listing has lots of partial register stalls which seem to be killing the performance. I'm stunned and amazed. -- Gareth |
From: Nathan H. <na...@ma...> - 2000-10-10 02:08:52
|
On Mon, Oct 09, 2000 at 01:55:53PM -0600, Keith Whitwell wrote: > > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. Instead of putting the UNLOCK/LOCK pairs inside the span functions, and having LOCK/UNLOCK around the slow fallbacks, how about having the slow fallback default to nolock but with LOCK/UNLOCK around each call to the span functions. The cliprects would be reworked (if necessary) around each span, inside the LOCK/UNLOCK pair and just before drawing the span. The other option is to extend the X server for two forms of lock: 1 for changing the cliprects and 1 for everything else. Then you can lock the cliprects around the whole fallback and unlock everything else with the "flashing" concept. |
From: Nathan H. <na...@ma...> - 2000-10-10 01:48:03
|
On Mon, Oct 09, 2000 at 01:55:53PM -0600, Keith Whitwell wrote: > OK, so in examining the newest tdfx driver, I got to wondering about the calls > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > the slowdown after Brian added cliprect handling to those calls, I wondered > what would happen if I moved locking out of the triangle function. > > The answer was suprising: > > old lock in cliprect lock outside > trunk trifunc in trifunc trifunc > > gears 448 560 550 650 fps > isosurf 56 60 60 85 fps > trispd-50 520k 572k 567k 921k tris/sec > > on a celeron 400 with a v3-3000. We are getting close to a 50% overall > speedup on this branch (and better for certain apps)... > > So... What's the catch? > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > trouble with this is the span fallbacks: we lock in these on a per-spanline > basis. We can remove locking from the span callbacks, and be fine on triangle > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > should the locking occur there? > > To my mind, the obvious thing to do is: > > - Add RenderStart/RenderFinish calls around all possible calls to the > span/pixel functions > - Do locking in RenderStart/RenderFinish in the tdfx driver > - Remove locking from triangle and spanline functions in the tdfx driver That would be great. It'd make all the drivers simpler because they can rely on the cliprects not "changing from under them" while rendering. > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. Hrm. We can't do that. Imagine the case where the span functions are writing some form of large slow blit (say a DrawPixels into a large region with many cliprects). If the X server gets any time then the user could move a window over the region then the clip rects could potentially change. The result is the span functions are working on outdated clip rects and you get nasty screen corruptions. This was what we had before and this is exactly why BEGIN/END_CLIP_LOOP were added. We were getting screen corruptions in the span functions. Now I agree the clip loops need to go elsewhere. They are a performance diaster. In the pathological case (5-6 cliprects) you will be rendering the same triangle 5-6 times, degrading performance to 15%. But the real problem is that clip loops make the normal case (1 clip rect) slower. I think your proposal is good, but we can't do this "unlock/lock" thing or the problems all come back (though in a smaller time window). |
From: Gareth H. <ga...@va...> - 2000-10-09 23:07:32
|
Keith Whitwell wrote: > > OK, so in examining the newest tdfx driver, I got to wondering about the calls > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > the slowdown after Brian added cliprect handling to those calls, I wondered > what would happen if I moved locking out of the triangle function. Just before I left, I had begun implementing exactly the same optimization. Thanks for taking care of this! > So... What's the catch? > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > trouble with this is the span fallbacks: we lock in these on a per-spanline > basis. We can remove locking from the span callbacks, and be fine on triangle > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > should the locking occur there? > > To my mind, the obvious thing to do is: > > - Add RenderStart/RenderFinish calls around all possible calls to the > span/pixel functions > - Do locking in RenderStart/RenderFinish in the tdfx driver > - Remove locking from triangle and spanline functions in the tdfx driver I think this is the nicest way to handle driver-side immediate mode ("direct") rendering. If the tdfx driver buffered vertices and then submitted them as required (like in the other MGA-style drivers) this wouldn't be a problem. Aside: I've been looking at taking advantage of the COMMAND_TRANSPORT extension in Glide, which will basically allow us to bypass the Glide triangle functions and write directly to the FIFO. This would involve buffering of vertices and submitting them in a bunch. Only this submission would require the hardware lock, just like in the other drivers. If only I had a damned machine... > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. This looks like a nice way to go. -- Gareth |
From: Gareth H. <ga...@va...> - 2000-10-09 22:34:48
|
Keith Whitwell wrote: > > Brian Paul wrote: > > > > One more thing to consider: moving the locking to a higher level may make > > debugging harder. When the driver has the lock, the whole display is > > locked so you'd have to debug from a different X display. It would be nice > > if we could choose between the two locking levels at compile time. That > > might be a bit ugly but could make life easier when debugging the driver. > > This is feasible at compile-time, though I think that debugging remotely is > the better approach in any case. I agree - I speak from personal experience that debugging on the same machine as you're running the DRI on is typically very difficult to do. I think the code will be significantly cleaner if we don't allow this option at all. -- Gareth |
From: Keith W. <ke...@va...> - 2000-10-09 21:05:57
|
Brian Paul wrote: > > Keith Whitwell wrote: > > > > OK, so in examining the newest tdfx driver, I got to wondering about the calls > > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > > the slowdown after Brian added cliprect handling to those calls, I wondered > > what would happen if I moved locking out of the triangle function. > > > > The answer was suprising: > > > > old lock in cliprect lock outside > > trunk trifunc in trifunc trifunc > > > > gears 448 560 550 650 fps > > isosurf 56 60 60 85 fps > > trispd-50 520k 572k 567k 921k tris/sec > > > > on a celeron 400 with a v3-3000. We are getting close to a 50% overall > > speedup on this branch (and better for certain apps)... > > > > So... What's the catch? > > > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > > trouble with this is the span fallbacks: we lock in these on a per-spanline > > basis. We can remove locking from the span callbacks, and be fine on triangle > > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > > should the locking occur there? > > > > To my mind, the obvious thing to do is: > > > > - Add RenderStart/RenderFinish calls around all possible calls to the > > span/pixel functions > > - Do locking in RenderStart/RenderFinish in the tdfx driver > > - Remove locking from triangle and spanline functions in the tdfx driver > > That's what I would do. > > I can add the RenderStart/Finish calls to Mesa (if you haven't already). > glClear also uses the span functions, BTW. That would be helpful - I'm looking at other consequences of this (relating to transition between single and multiple cliprects) at the moment. Keith |
From: Keith W. <ke...@va...> - 2000-10-09 21:05:04
|
Brian Paul wrote: > > > One more thing to consider: moving the locking to a higher level may make > debugging harder. When the driver has the lock, the whole display is > locked so you'd have to debug from a different X display. It would be nice > if we could choose between the two locking levels at compile time. That > might be a bit ugly but could make life easier when debugging the driver. This is feasible at compile-time, though I think that debugging remotely is the better approach in any case. Keith |
From: Brian P. <br...@va...> - 2000-10-09 20:43:38
|
Keith Whitwell wrote: > > OK, so in examining the newest tdfx driver, I got to wondering about the calls > to LOCK_HARDWARE inside the triangle functions. In particular, after noticing > the slowdown after Brian added cliprect handling to those calls, I wondered > what would happen if I moved locking out of the triangle function. > > The answer was suprising: > > old lock in cliprect lock outside > trunk trifunc in trifunc trifunc > > gears 448 560 550 650 fps > isosurf 56 60 60 85 fps > trispd-50 520k 572k 567k 921k tris/sec > > on a celeron 400 with a v3-3000. We are getting close to a 50% overall > speedup on this branch (and better for certain apps)... > > So... What's the catch? > > Basically, to lock outside the trifuncs, I need somewhere to lock. The > obvious place is in the RenderStart/RenderFinish driver callbacks. The only > trouble with this is the span fallbacks: we lock in these on a per-spanline > basis. We can remove locking from the span callbacks, and be fine on triangle > rendering. However, the span fallbacks are also called from DrawPixels, etc. > > DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where > should the locking occur there? > > To my mind, the obvious thing to do is: > > - Add RenderStart/RenderFinish calls around all possible calls to the > span/pixel functions > - Do locking in RenderStart/RenderFinish in the tdfx driver > - Remove locking from triangle and spanline functions in the tdfx driver That's what I would do. I can add the RenderStart/Finish calls to Mesa (if you haven't already). glClear also uses the span functions, BTW. > One potential problem with this is that in fallback cases we will hold the > hardware lock for the time it takes to render an entire vertex buffer of > triangles, one spanline at a time. I propose to get around this by 'flashing' > the lock in the spanline and pixel functions, eg: > > UNLOCK_HARDWARE(fxMesa); > LOCK_HARDWARE(fxMesa); > ... > > To allow a (tiny) window for the X server or other clients to grab the lock. Good idea. One more thing to consider: moving the locking to a higher level may make debugging harder. When the driver has the lock, the whole display is locked so you'd have to debug from a different X display. It would be nice if we could choose between the two locking levels at compile time. That might be a bit ugly but could make life easier when debugging the driver. -Brian |
From: Keith W. <ke...@va...> - 2000-10-09 19:54:11
|
OK, so in examining the newest tdfx driver, I got to wondering about the calls to LOCK_HARDWARE inside the triangle functions. In particular, after noticing the slowdown after Brian added cliprect handling to those calls, I wondered what would happen if I moved locking out of the triangle function. The answer was suprising: old lock in cliprect lock outside trunk trifunc in trifunc trifunc gears 448 560 550 650 fps isosurf 56 60 60 85 fps trispd-50 520k 572k 567k 921k tris/sec on a celeron 400 with a v3-3000. We are getting close to a 50% overall speedup on this branch (and better for certain apps)... So... What's the catch? Basically, to lock outside the trifuncs, I need somewhere to lock. The obvious place is in the RenderStart/RenderFinish driver callbacks. The only trouble with this is the span fallbacks: we lock in these on a per-spanline basis. We can remove locking from the span callbacks, and be fine on triangle rendering. However, the span fallbacks are also called from DrawPixels, etc. DrawPixels, etc. don't currently call RenderStart/RenderFinish, so where should the locking occur there? To my mind, the obvious thing to do is: - Add RenderStart/RenderFinish calls around all possible calls to the span/pixel functions - Do locking in RenderStart/RenderFinish in the tdfx driver - Remove locking from triangle and spanline functions in the tdfx driver One potential problem with this is that in fallback cases we will hold the hardware lock for the time it takes to render an entire vertex buffer of triangles, one spanline at a time. I propose to get around this by 'flashing' the lock in the spanline and pixel functions, eg: UNLOCK_HARDWARE(fxMesa); LOCK_HARDWARE(fxMesa); ... To allow a (tiny) window for the X server or other clients to grab the lock. Keith |
From: Allen A. <ak...@po...> - 2000-10-06 20:36:09
|
Hi, Dave! On Fri, Oct 06, 2000 at 11:29:30AM -0700, Dave Shreiner wrote: | | However, its not as if people have been unsuccessful using | glXChooseVisual(). ... It seems clear to me that people aren't making a lot of use of the alternatives, e.g. visinfo has been around since 1994 and most OpenGL people I know have never heard of it. But screwups with glXChooseVisual() are really common. My experience is that most people just experiment until they find some combination of input parameters that works, and then stick with that until the app is run on some other OpenGL implementation and breaks there. | Personally, I would vote for something in the spirit of "isfast". That's the way visinfo was distributed originally, but apparently it never caught on. | On the (larger, IMO) con side, generic use of an alternative visual | selection algorithm (which probably isn't needed in many cases anyhow), | means that as new features become "default" (like multisampling) for | visuals, these changes should be transparently folded into something | like glXChooseVisual() without requiring modification of the developer's | code. The same thing can be done with visinfo as long as it's in a dynamic library. (BTW, visinfo already has multisampling support, provided in an upward-compatible way like glXChooseVisual().) | I don't think glXChooseVisual() sucks so bad that an alternative | needs to be standardized. Not to slight the work that Allan's done, but | a faster, and easier, solution would be some better documentation of how | to choose visual. Could be. I haven't pushed the issue in years because it was just too low on the priority list. But in the long run, I feel pretty confident that people who give up on glXChooseVisual() and switch to calling glXGetConfig() to do their own visual selection will eventually just duplicate the effort that went into visinfo. Might as well save them some work... Allen |
From: Allen A. <ak...@po...> - 2000-10-06 20:25:07
|
On Fri, Oct 06, 2000 at 12:31:38PM -0500, Stephen J Baker wrote: | On Fri, 6 Oct 2000, Brian Paul wrote: | > | > In this case I don't feel an extension is necessary. Allen's visual | > selection utility is completely layered on top of the GLX API; it needs | > no hooks inside GLX. | ... | So what we *really* need is a new 'GLXU' library that contains | utility functions that apply to GLX. I don't have any fundamental disagreement with that idea, but I wanted to point out that the visual-selection API doesn't have to be GLX-specific (the version in glean isn't, for example), so it doesn't really belong in a GLXU library. Allen |
From: Dave S. <shr...@sg...> - 2000-10-06 18:29:36
|
Daryll Strauss writes: > > In this case I don't feel an extension is necessary. Allen's visual > > selection utility is completely layered on top of the GLX API; it needs > > no hooks inside GLX. I think people who need the code could simply add > > the .c and .h files to their project and compile it in. > > > > That way, there's no hassles with versioning or compile-time or run- > > time extension testing, etc. And we all know how bad that can be! > > I agree technically that works, but that doesn't get it widely > distributed and if people don't know it exists they won't use it. However, its not as if people have been unsuccessful using glXChooseVisual(). I tend to believe that very few applications go to the trouble of very specific visual selection above glXChooseVisual(). The most notable execpetions are things like sensor simulation, and that's mostly because they exporting digital data, and not a visual image. > I'm trying to get it included in some standard library (maybe as > extension) so that people have it and use it instead of inverting > their own. If you really dislike attaching it to GLX maybe we need a > our own libMesaU (or equivalent) that can include stuff like this. Personally, I would vote for something in the spirit of "isfast". On the pro side of attaching it to GLX, as GLX evolves, then perhaps so will it. On the (larger, IMO) con side, generic use of an alternative visual selection algorithm (which probably isn't needed in many cases anyhow), means that as new features become "default" (like multisampling) for visuals, these changes should be transparently folded into something like glXChooseVisual() without requiring modification of the developer's code. I don't think glXChooseVisual() sucks so bad that an alternative needs to be standardized. Not to slight the work that Allan's done, but a faster, and easier, solution would be some better documentation of how to choose visual. Thanks. -- Thanx, Dave --------------------------------------------------------------------- Dave Shreiner <shr...@sg...> Silicon Graphics, Inc. (650) 933-4899 |
From: Stephen J B. <sj...@li...> - 2000-10-06 17:37:54
|
On Fri, 6 Oct 2000, Daryll Strauss wrote: > > That way, there's no hassles with versioning or compile-time or run- > > time extension testing, etc. And we all know how bad that can be! > > I agree technically that works, but that doesn't get it widely > distributed and if people don't know it exists they won't use it. I'm > trying to get it included in some standard library (maybe as extension) > so that people have it and use it instead of inverting their own. If you > really dislike attaching it to GLX maybe we need a our own libMesaU (or > equivalent) that can include stuff like this. With the widespread use of nVidia's OpenGL for Linux, there is no point in putting it into a libMesaU - that would just make matters worse. ---- Steve Baker (817)619-2657 (Vox/Vox-Mail) L3Com/Link Simulation & Training (817)619-2466 (Fax) Work: sj...@li... http://www.link.com Home: sjb...@ai... http://web2.airmail.net/sjbaker1 |
From: Stephen J B. <sj...@li...> - 2000-10-06 17:31:46
|
On Fri, 6 Oct 2000, Brian Paul wrote: > > That leaves GLUT (which already has it - essentially) or building > > a new command into GLX where everyone can get at it. > > > > I vote for a GLX extension. > > In this case I don't feel an extension is necessary. Allen's visual > selection utility is completely layered on top of the GLX API; it needs > no hooks inside GLX. I think people who need the code could simply add > the .c and .h files to their project and compile it in. Hmmm - agreed. So what we *really* need is a new 'GLXU' library that contains utility functions that apply to GLX. GLXU is to GLX as GLU is to GL ...but then you descend a slippery slope that leads you back to GLUT. > That way, there's no hassles with versioning or compile-time or run- > time extension testing, etc. And we all know how bad that can be! Please - don't remind me. ---- Steve Baker (817)619-2657 (Vox/Vox-Mail) L3Com/Link Simulation & Training (817)619-2466 (Fax) Work: sj...@li... http://www.link.com Home: sjb...@ai... http://web2.airmail.net/sjbaker1 |
From: Brian P. <br...@va...> - 2000-10-06 16:13:10
|
Daryll Strauss wrote: > > Stephen J Baker wrote: > > I don't think GLU is the right place for it because GLU is generally > > independent of the windowing system - just as GL is...and rightly so. > > I actually meant GLX. Oops. > > On Fri, Oct 06, 2000 at 08:42:24AM -0600, Brian Paul wrote: > > In this case I don't feel an extension is necessary. Allen's visual > > selection utility is completely layered on top of the GLX API; it needs > > no hooks inside GLX. I think people who need the code could simply add > > the .c and .h files to their project and compile it in. > > > > That way, there's no hassles with versioning or compile-time or run- > > time extension testing, etc. And we all know how bad that can be! > > I agree technically that works, but that doesn't get it widely > distributed and if people don't know it exists they won't use it. That's a good point. > I'm > trying to get it included in some standard library (maybe as extension) > so that people have it and use it instead of inverting their own. If you > really dislike attaching it to GLX maybe we need a our own libMesaU (or > equivalent) that can include stuff like this. In the Mesa distro I have a util/ directory with some useful odds and ends and on my personal systems I have years worth of OpenGL tests, demos and hacks. It's occured to me that having an online repository for all this stuff might be nice. Open-source, of course. I wonder if there'd be much interest in setting up new SourceForge project, perhaps "OpenGL Toolbox", to collect all these odds and ends? Would that be redundant with www.opengl.org's content? Another point: the Mesa distro has a separate package of demo programs. They're mostly portable but the build environment is geared toward Mesa only. It might be nice to convert those demos into a more flexible package that anyone (i.e. non-Mesa users) can easily use. That might also relieve me of some maintence work. Unfortunately, I don't have time to setup anything like this. But there's probably someone out there who could do so, if people think it's worthwhile. -Brian |
From: Daryll S. <da...@va...> - 2000-10-06 15:43:04
|
Stephen J Baker wrote: > I don't think GLU is the right place for it because GLU is generally > independent of the windowing system - just as GL is...and rightly so. I actually meant GLX. Oops. On Fri, Oct 06, 2000 at 08:42:24AM -0600, Brian Paul wrote: > In this case I don't feel an extension is necessary. Allen's visual > selection utility is completely layered on top of the GLX API; it needs > no hooks inside GLX. I think people who need the code could simply add > the .c and .h files to their project and compile it in. > > That way, there's no hassles with versioning or compile-time or run- > time extension testing, etc. And we all know how bad that can be! I agree technically that works, but that doesn't get it widely distributed and if people don't know it exists they won't use it. I'm trying to get it included in some standard library (maybe as extension) so that people have it and use it instead of inverting their own. If you really dislike attaching it to GLX maybe we need a our own libMesaU (or equivalent) that can include stuff like this. - |Daryll |
From: Brian P. <br...@va...> - 2000-10-06 14:47:02
|
Michael Vance wrote: > > Brian, > > As a heads up, the official ARB_texture_compression standard > (http://oss.sgi.com/projects/ogl-sample/registry/ARB/texture_compression.txt) > defines the internalFormat parameter of CompressedTexImage* to be > 'int' and not 'enum'. I would think 'enum' is more appropriate, but > there it is all the same. A good C++ compiler will complain about > this. I've been told that the internalFormat parameter should be GLenum, as the glext.h file indicates. It's the texture compression spec on the website needs to be updated. -Brian |
From: Brian P. <br...@va...> - 2000-10-06 14:43:33
|
Stephen J Baker wrote: > > On Thu, 5 Oct 2000, Daryll Strauss wrote: > > > On Thu, Oct 05, 2000 at 03:41:42PM -0700, Allen Akin wrote: > > > Since 1994 I've been using the visinfo package that I wrote while at > > > SGI. Among other things, it's used in isfast and glean, and it had > > > some influence on Mark Kilgard's design for the display initialization > > > string in GLUT. > > > > Maybe this is something we should package up and include in GLU or > > something. It sounds like it would be generally useful and might > > discourage the use of glXChooseVisual. > > I don't think GLU is the right place for it because GLU is generally > independent of the windowing system - just as GL is...and rightly so. Right. > That leaves GLUT (which already has it - essentially) or building > a new command into GLX where everyone can get at it. > > I vote for a GLX extension. In this case I don't feel an extension is necessary. Allen's visual selection utility is completely layered on top of the GLX API; it needs no hooks inside GLX. I think people who need the code could simply add the .c and .h files to their project and compile it in. That way, there's no hassles with versioning or compile-time or run- time extension testing, etc. And we all know how bad that can be! -Brian |
From: Stephen J B. <sj...@li...> - 2000-10-06 13:42:27
|
On Thu, 5 Oct 2000, Daryll Strauss wrote: > On Thu, Oct 05, 2000 at 03:41:42PM -0700, Allen Akin wrote: > > Since 1994 I've been using the visinfo package that I wrote while at > > SGI. Among other things, it's used in isfast and glean, and it had > > some influence on Mark Kilgard's design for the display initialization > > string in GLUT. > > Maybe this is something we should package up and include in GLU or > something. It sounds like it would be generally useful and might > discourage the use of glXChooseVisual. I don't think GLU is the right place for it because GLU is generally independent of the windowing system - just as GL is...and rightly so. That leaves GLUT (which already has it - essentially) or building a new command into GLX where everyone can get at it. I vote for a GLX extension. ---- Steve Baker (817)619-2657 (Vox/Vox-Mail) L3Com/Link Simulation & Training (817)619-2466 (Fax) Work: sj...@li... http://www.link.com Home: sjb...@ai... http://web2.airmail.net/sjbaker1 |
From: Allen A. <ak...@po...> - 2000-10-06 00:35:55
|
On Thu, Oct 05, 2000 at 04:36:50PM -0700, Daryll Strauss wrote: | | Maybe this is something we should package up and include in GLU or | something. It sounds like it would be generally useful and might | discourage the use of glXChooseVisual. The ARB gets to decide what goes into the official GLU, so we couldn't just put it in unilaterally. However, it could be treated as a Mesa (or EXT) extension. Allen |