You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
(10) |
Apr
(28) |
May
(41) |
Jun
(91) |
Jul
(63) |
Aug
(45) |
Sep
(37) |
Oct
(80) |
Nov
(91) |
Dec
(47) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(48) |
Feb
(121) |
Mar
(126) |
Apr
(16) |
May
(85) |
Jun
(84) |
Jul
(115) |
Aug
(71) |
Sep
(27) |
Oct
(33) |
Nov
(15) |
Dec
(71) |
2002 |
Jan
(73) |
Feb
(34) |
Mar
(39) |
Apr
(135) |
May
(59) |
Jun
(116) |
Jul
(93) |
Aug
(40) |
Sep
(50) |
Oct
(87) |
Nov
(90) |
Dec
(32) |
2003 |
Jan
(181) |
Feb
(101) |
Mar
(231) |
Apr
(240) |
May
(148) |
Jun
(228) |
Jul
(156) |
Aug
(49) |
Sep
(173) |
Oct
(169) |
Nov
(137) |
Dec
(163) |
2004 |
Jan
(243) |
Feb
(141) |
Mar
(183) |
Apr
(364) |
May
(369) |
Jun
(251) |
Jul
(194) |
Aug
(140) |
Sep
(154) |
Oct
(167) |
Nov
(86) |
Dec
(109) |
2005 |
Jan
(176) |
Feb
(140) |
Mar
(112) |
Apr
(158) |
May
(140) |
Jun
(201) |
Jul
(123) |
Aug
(196) |
Sep
(143) |
Oct
(165) |
Nov
(158) |
Dec
(79) |
2006 |
Jan
(90) |
Feb
(156) |
Mar
(125) |
Apr
(146) |
May
(169) |
Jun
(146) |
Jul
(150) |
Aug
(176) |
Sep
(156) |
Oct
(237) |
Nov
(179) |
Dec
(140) |
2007 |
Jan
(144) |
Feb
(116) |
Mar
(261) |
Apr
(279) |
May
(222) |
Jun
(103) |
Jul
(237) |
Aug
(191) |
Sep
(113) |
Oct
(129) |
Nov
(141) |
Dec
(165) |
2008 |
Jan
(152) |
Feb
(195) |
Mar
(242) |
Apr
(146) |
May
(151) |
Jun
(172) |
Jul
(123) |
Aug
(195) |
Sep
(195) |
Oct
(138) |
Nov
(183) |
Dec
(125) |
2009 |
Jan
(268) |
Feb
(281) |
Mar
(295) |
Apr
(293) |
May
(273) |
Jun
(265) |
Jul
(406) |
Aug
(679) |
Sep
(434) |
Oct
(357) |
Nov
(306) |
Dec
(478) |
2010 |
Jan
(856) |
Feb
(668) |
Mar
(927) |
Apr
(269) |
May
(12) |
Jun
(13) |
Jul
(6) |
Aug
(8) |
Sep
(23) |
Oct
(4) |
Nov
(8) |
Dec
(11) |
2011 |
Jan
(4) |
Feb
(2) |
Mar
(3) |
Apr
(9) |
May
(6) |
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
(1) |
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Josh V. <ho...@na...> - 2001-03-13 23:44:43
|
"Stephen J Baker" <sj...@li...> writes: > If it's for something like varying the Z buffer depth - then a large > overhead might be acceptable (although I still maintain that creating > a dependence on a C compiler is a disasterously bad decision). I just want to make sure there isn't a misunderstanding here - the C compiler would only be used to compile special case functions. There would always be a generic fallback, so Mesa would work (although more slowly) even without a compiler installed. > If it's something like catering for a particular combination of (say) > blend mode, texture type, dither and fogging (the kind of thing that > the SGI OpenGL-for-Windoze appears to do) - then taking even a couple > of milliseconds at runtime when you first happen to see a polygon with > that combination of states would be completely unacceptable - even with > caching. If it was just 20 milliseconds or so, I doubt most users would notice. The problem is, I can easily see the delay being 500 ms for a big function. I can think of two workarounds: 1. Make the cache persistent, then the delay would only happen the first time you run your application. 2. Run the compiler in the background at low priority (say 10% cpu). You would have to make do with the generic fallback until the compile completes, but there wouldn't be a 'hitch' when the application exposes a different path. Do you think that would be acceptable? Josh |
From: Gareth H. <ga...@va...> - 2001-03-13 17:32:39
|
Keith Whitwell wrote: > > But of course, we should never be getting rid of or aquiring new dma buffers; > we should be just aging regions within a buffer we always hold, and perhaps in > rare cases growing/shrinking it. > > However, mprotect doesn't help you here either, as the regions being aged are > pretty dynamic. If only I could sit down and finish my dynamic heap code... :-) My plan is to basically have a pool of 2-4 MB DMA regions, with each one assigned to at most one context. The context handles all allocation within the buffer, and the regions are aged by the kernel when they are flushed. The flush is the only thing that you need to go into the kernel for, and I can't really see any way around that. (Note: current status is working allocation code in a single region, need to finish off multiple regions, switching regions between contexts and the like. The new SAREA code will allow a finer grain of control over the region, as we're currently limited to a 4K control page). If you were lucky enough to fill your region, your next LOCK_DMA_REGION() would automagically grab you another one. I think that's easier than trying to resize the current one. Grabbing another region may result in allocating and binding more AGP memory, but that's a nice and transparent side effect. -- Gareth |
From: Keith W. <ke...@va...> - 2001-03-13 17:17:04
|
Josh Vanderhoof wrote: > > Gareth Hughes <ga...@va...> writes: > > > Compiling this with the code Josh sent takes around 70 msec on my 700MHz > > PIII laptop. > > > > void codegen_test( void ) > > { > > printf( "hello, world!\n" ); > > } > > > > That's a long time at 60fps... > > I would think you would keep a cache of the stuff that you've compiled > to avoid re-compiling every time you want to call the routine. So > you're really looking at a one time start up cost. Is there something > where the code would change on every single frame? Yes. I think you'd probably see less than 10 compiles at application startup and then no more. If applications behave in a wierd morphing way, just pull the plug on compilation and let them use the generic code. Keith |
From: Stephen J B. <sj...@li...> - 2001-03-13 17:14:54
|
On 12 Mar 2001, Josh Vanderhoof wrote: > Gareth Hughes <ga...@va...> writes: > > > Compiling this with the code Josh sent takes around 70 msec on my 700MHz > > PIII laptop. > > > > void codegen_test( void ) > > { > > printf( "hello, world!\n" ); > > } > > > > That's a long time at 60fps... > > I would think you would keep a cache of the stuff that you've compiled > to avoid re-compiling every time you want to call the routine. So > you're really looking at a one time start up cost. Is there something > where the code would change on every single frame? It depends what you use this code for. If it's for something like varying the Z buffer depth - then a large overhead might be acceptable (although I still maintain that creating a dependence on a C compiler is a disasterously bad decision). If it's something like catering for a particular combination of (say) blend mode, texture type, dither and fogging (the kind of thing that the SGI OpenGL-for-Windoze appears to do) - then taking even a couple of milliseconds at runtime when you first happen to see a polygon with that combination of states would be completely unacceptable - even with caching. ---- Steve Baker (817)619-2657 (Vox/Vox-Mail) L3Com/Link Simulation & Training (817)619-2466 (Fax) Work: sj...@li... http://www.link.com Home: sjb...@ai... http://web2.airmail.net/sjbaker1 |
From: Keith W. <ke...@va...> - 2001-03-13 17:14:33
|
Gareth Hughes wrote: > > Josh Vanderhoof wrote: > > > > I'm getting sidetracked here but: > > > > 1. Before you start doing crazy optimizations why wouldn't you rewrite > > it like this (get rid of the "dma.space" variable): > > > > void foo_Vertex3fv( const GLfloat *v ) > > { > > GET_FOO_CONTEXT(fmesa); > > COPY_3V( fmesa->current.o3n3t2.obj, v ); > > if ( fmesa->dma.head + 8 <= fmesa->dma.end_of_space ) { > > COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); > > fmesa->dma.head += 8; > > } else { > > fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); > > } > > } > > Sure -- it was just a cut and paste of some old code Keith sent me. > Minor point. Historically it's because it is fmesa->dma->head not fmesa->dma.head. It's an ugliness that was built into the drivers early on. Keith |
From: Keith W. <ke...@va...> - 2001-03-13 17:12:47
|
Gareth Hughes wrote: > > Josh Vanderhoof wrote: > > > > I was too vague. You could mprotect() the page after the DMA buffer > > to PROT_NONE and install a SIGSEGV handler that flushes the buffer if > > the SIGSEGV was on the end of the DMA buffer. Then you can just let > > Vertex3f segfault when it uses the buffer up. You would save yourself > > a test and a predicted jump per call. Not much, but it sounds like > > you really want to optimize this. > > Sorry, I get you. You might even be able to do fancy things and catch > this in the kernel module -- that would be neat :-) Save us an ioctl... Now that is kindof neat... Keith |
From: Keith W. <ke...@va...> - 2001-03-13 17:11:50
|
Gareth Hughes wrote: > > Josh Vanderhoof wrote: > > > > I'm getting sidetracked here but: > > > > 1. Before you start doing crazy optimizations why wouldn't you rewrite > > it like this (get rid of the "dma.space" variable): > > > > void foo_Vertex3fv( const GLfloat *v ) > > { > > GET_FOO_CONTEXT(fmesa); > > COPY_3V( fmesa->current.o3n3t2.obj, v ); > > if ( fmesa->dma.head + 8 <= fmesa->dma.end_of_space ) { > > COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); > > fmesa->dma.head += 8; > > } else { > > fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); > > } > > } > > Sure -- it was just a cut and paste of some old code Keith sent me. > Minor point. > > > 2. Hard-coding the address would help you, but not by very much. I > > would expect it to save you one MOV instruction. On Intel cpu's, > > the "reg+offset" addressing mode is "free". (You probably knew > > that already.) > > Yep :-) GET_FOO_CONTEXT() is the big one, and perhaps I should have > stressed that a little more. This may involve a function call to > determine the current context, due to thread-safety issues. > > > 3. If you want to go all out on this code, you could probably use > > mprotect() to avoid the buffer overflow test entirely. That would > > only be a good idea if buffer overflows are rare though. > > You need buffer overflows as they end up flushing the DMA buffer. In > this case, get_dma() would flush the current buffer and acquire a new > one. But of course, we should never be getting rid of or aquiring new dma buffers; we should be just aging regions within a buffer we always hold, and perhaps in rare cases growing/shrinking it. However, mprotect doesn't help you here either, as the regions being aged are pretty dynamic. Keith |
From: Andrew R. <and...@uc...> - 2001-03-13 14:05:57
|
Dear all, How fast do mesa GL functions return? A glVetex* function does not get processed all the way down the rendering pipeline until the VB is full, right? What about Texture functions and Array functions? Andy -- \\\|/// \\ - - // ( @ @ ) +---------------o00o----(_)----o00o-------------------------------------+ |Andy Richardson Dept. of Chemistry | |t(w): +44-20-7679 (4718) University college London | |f(w): +44-20-7679 (4560) Gordon street | |e: and...@uc... London WC1E 6BT UK | +------------------------------0ooo-------------------------------------+ ooo0 ( ) ( ) ) / \ ( (_/ \_) |
From: Gareth H. <ga...@va...> - 2001-03-13 05:15:37
|
Josh Vanderhoof wrote: > > I would think you would keep a cache of the stuff that you've compiled > to avoid re-compiling every time you want to call the routine. So > you're really looking at a one time start up cost. Is there something > where the code would change on every single frame? Absolutely. I'm not sure how much of it needs to be truly dynamic, but it's useful to know how expensive compilation can be. It might be worth compiling it as a .o file and using the BFD library to extract the function, for instance. -- Gareth |
From: Josh V. <ho...@na...> - 2001-03-13 03:32:37
|
Gareth Hughes <ga...@va...> writes: > Compiling this with the code Josh sent takes around 70 msec on my 700MHz > PIII laptop. > > void codegen_test( void ) > { > printf( "hello, world!\n" ); > } > > That's a long time at 60fps... I would think you would keep a cache of the stuff that you've compiled to avoid re-compiling every time you want to call the routine. So you're really looking at a one time start up cost. Is there something where the code would change on every single frame? Josh |
From: Gareth H. <ga...@va...> - 2001-03-13 02:02:53
|
Compiling this with the code Josh sent takes around 70 msec on my 700MHz PIII laptop. void codegen_test( void ) { printf( "hello, world!\n" ); } That's a long time at 60fps... -- Gareth |
From: Gareth H. <ga...@va...> - 2001-03-13 01:55:00
|
Allen Akin wrote: > > On Mon, Mar 12, 2001 at 08:39:30PM -0500, Josh Vanderhoof wrote: > | ... You could mprotect() the page after the DMA buffer > | to PROT_NONE and install a SIGSEGV handler that flushes the buffer if > | the SIGSEGV was on the end of the DMA buffer. Then you can just let > | Vertex3f segfault when it uses the buffer up. > > I haven't used the POSIX signalling system, so please pardon an > uninformed question. We couldn't afford to install a short-term > signal handler each time Vertex3f is called; it would be cheaper just > to test for overflow. However, any handler installed long-term by the > driver could interfere with a handler installed by the application. > Is there a good way to work around that problem? I agree. And yes, just test for buffer overflow :-) Still, it's a neat idea. -- Gareth |
From: Allen A. <ak...@po...> - 2001-03-13 01:48:01
|
On Mon, Mar 12, 2001 at 08:39:30PM -0500, Josh Vanderhoof wrote: | ... You could mprotect() the page after the DMA buffer | to PROT_NONE and install a SIGSEGV handler that flushes the buffer if | the SIGSEGV was on the end of the DMA buffer. Then you can just let | Vertex3f segfault when it uses the buffer up. I haven't used the POSIX signalling system, so please pardon an uninformed question. We couldn't afford to install a short-term signal handler each time Vertex3f is called; it would be cheaper just to test for overflow. However, any handler installed long-term by the driver could interfere with a handler installed by the application. Is there a good way to work around that problem? Allen |
From: Gareth H. <ga...@va...> - 2001-03-13 01:47:01
|
Josh Vanderhoof wrote: > > I was too vague. You could mprotect() the page after the DMA buffer > to PROT_NONE and install a SIGSEGV handler that flushes the buffer if > the SIGSEGV was on the end of the DMA buffer. Then you can just let > Vertex3f segfault when it uses the buffer up. You would save yourself > a test and a predicted jump per call. Not much, but it sounds like > you really want to optimize this. Sorry, I get you. You might even be able to do fancy things and catch this in the kernel module -- that would be neat :-) Save us an ioctl... However, the way we manage DMA space will change soon and this will no longer be possible. The drivers will be using a dynamic allocation scheme to get better utilization of the DMA space. Fixed size buffers suck. I'll play around with similar ideas -- mprotect()ing the DMA buffer and catching the segfault in the kernel sure would be neat... Don't know if we can do that, but it's worth looking at. -- Gareth |
From: Josh V. <ho...@na...> - 2001-03-13 01:35:53
|
Gareth Hughes <ga...@va...> writes: > > 3. If you want to go all out on this code, you could probably use > > mprotect() to avoid the buffer overflow test entirely. That would > > only be a good idea if buffer overflows are rare though. > > You need buffer overflows as they end up flushing the DMA buffer. In > this case, get_dma() would flush the current buffer and acquire a new > one. I was too vague. You could mprotect() the page after the DMA buffer to PROT_NONE and install a SIGSEGV handler that flushes the buffer if the SIGSEGV was on the end of the DMA buffer. Then you can just let Vertex3f segfault when it uses the buffer up. You would save yourself a test and a predicted jump per call. Not much, but it sounds like you really want to optimize this. Josh |
From: Gareth H. <ga...@va...> - 2001-03-13 00:33:03
|
Josh Vanderhoof wrote: > > I'm getting sidetracked here but: > > 1. Before you start doing crazy optimizations why wouldn't you rewrite > it like this (get rid of the "dma.space" variable): > > void foo_Vertex3fv( const GLfloat *v ) > { > GET_FOO_CONTEXT(fmesa); > COPY_3V( fmesa->current.o3n3t2.obj, v ); > if ( fmesa->dma.head + 8 <= fmesa->dma.end_of_space ) { > COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); > fmesa->dma.head += 8; > } else { > fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); > } > } Sure -- it was just a cut and paste of some old code Keith sent me. Minor point. > 2. Hard-coding the address would help you, but not by very much. I > would expect it to save you one MOV instruction. On Intel cpu's, > the "reg+offset" addressing mode is "free". (You probably knew > that already.) Yep :-) GET_FOO_CONTEXT() is the big one, and perhaps I should have stressed that a little more. This may involve a function call to determine the current context, due to thread-safety issues. > 3. If you want to go all out on this code, you could probably use > mprotect() to avoid the buffer overflow test entirely. That would > only be a good idea if buffer overflows are rare though. You need buffer overflows as they end up flushing the DMA buffer. In this case, get_dma() would flush the current buffer and acquire a new one. -- Gareth |
From: Gareth H. <ga...@va...> - 2001-03-13 00:28:45
|
Josh Vanderhoof wrote: > > Gareth brought up software rendering. While I think Gareth's dynamic > software renderer design could benefit from compiling at run time, I > was thinking more about functions that are currently implemented with > template header files. They are usually named "*tmp.h" and are used > in both hardware and software rendering. If Mesa could compile new > functions at run time, you could cover more special cases without > bloating the library with functions that are never called. To tell you the truth, I hadn't really thought about the implementation details of my software renderer enough. I certainly agree that there are benefits to using compilation, but this is a case where the cost of doing so would need to be investigated. One day when I have the time to actually write the code, we'll see what happens... -- Gareth |
From: Gareth H. <ga...@va...> - 2001-03-13 00:26:10
|
Keith Whitwell wrote: > > I agree with this, but I'm inclined to persue gcc-based codegen, at least as a > prototype for a more hard-wired system to follow it. I think we need to make > some progress in this area, and gcc looks like it's got a real low entry > level. Fair enough. I just hadn't thought it would be worth going to that amount of trouble, but there are obvious advantages in doing so. > It might be possible to use a tokenized generation language that can either be > expanded by the C preprocessor, or understood explicitly by a follow-on > bespoke codegen module. > > Some of the optimizations for the tnl functions like you've got above, such as > hardwiring addresses, using the right (ie non -fPIC) compiler options, can be > acheived using gcc. > > So in short, I don't know whether the overhead of gcc will be a problem at > runtime, but the low overhead for us right now makes it look like a real > attractive way to get started. If it works out ok at runtime, we've finished > unexpectedly early. I'm putting the finishing touches on the driver tnl module code that goes along with the core Mesa stuff I committed yesterday, so once that's done I might play with this a little (at least get some basic generation happening). -- Gareth |
From: Josh V. <ho...@na...> - 2001-03-12 23:30:41
|
Gareth Hughes <ga...@va...> writes: > struct foo_vertex_o3n3tc2 { > GLfloat obj[3]; > GLfloat normal[3]; > GLfloat tc[2]; > } > > void foo_Normal3fv( const GLfloat *v ) > { > GET_FOO_CONTEXT(fmesa); > COPY_3V( fmesa->current.o3n3t2.normal, v ); > } > > void foo_Vertex3fv( const GLfloat *v ) > { > GET_FOO_CONTEXT(fmesa); > COPY_3V( fmesa->current.o3n3t2.obj, v ); > if ( fmesa->dma.space >= 8 ) { > COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); > fmesa->dma.head += 8; > fmesa->dma.space -= 8; > } else { > fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); > } > } > > (The above is based on code by Keith Whitwell) > > You can, however, substitute most of that with hard-coded addresses for > the current context and make it as streamlined as possible. If you want > to call these functions 10, 30, 100 million times a second, you want > them to be *fast*... I'm getting sidetracked here but: 1. Before you start doing crazy optimizations why wouldn't you rewrite it like this (get rid of the "dma.space" variable): void foo_Vertex3fv( const GLfloat *v ) { GET_FOO_CONTEXT(fmesa); COPY_3V( fmesa->current.o3n3t2.obj, v ); if ( fmesa->dma.head + 8 <= fmesa->dma.end_of_space ) { COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); fmesa->dma.head += 8; } else { fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); } } 2. Hard-coding the address would help you, but not by very much. I would expect it to save you one MOV instruction. On Intel cpu's, the "reg+offset" addressing mode is "free". (You probably knew that already.) 3. If you want to go all out on this code, you could probably use mprotect() to avoid the buffer overflow test entirely. That would only be a good idea if buffer overflows are rare though. Josh |
From: Josh V. <ho...@na...> - 2001-03-12 23:30:30
|
"Stephen J Baker" <sj...@li...> writes: > The "average" system with a 700MHz CPU also has a kick-ass graphics > card that makes this discussion largely irrelevent. If software-only > rendering has any kind of a future at all, it's in PDA's, phones and > internet-capable toasters...where the overhead of having a compiler > on board at all (let alone actually running it on anything) tends to > be unacceptable. Gareth brought up software rendering. While I think Gareth's dynamic software renderer design could benefit from compiling at run time, I was thinking more about functions that are currently implemented with template header files. They are usually named "*tmp.h" and are used in both hardware and software rendering. If Mesa could compile new functions at run time, you could cover more special cases without bloating the library with functions that are never called. As for the toaster running Mesa, you might find out that the Mesa+compiler package is SMALLER than normal Mesa. Removing all the special case template functions could reduce the size of Mesa enough to offset the size of a compiler. Josh |
From: Keith W. <ke...@va...> - 2001-03-12 21:47:11
|
Gareth Hughes wrote: > > Stephen J Baker wrote: > > > > On 9 Mar 2001, Josh Vanderhoof wrote: > > > > > Of course, you do have to take a start up penalty with run-time > > > compiled code. Considering that the average system is around 700MHz > > > (just a guess) and getting faster every day, I think people may be > > > overestimating how expensive using a real compiler would be. > > > > The "average" system with a 700MHz CPU also has a kick-ass graphics > > card that makes this discussion largely irrelevent. If software-only > > rendering has any kind of a future at all, it's in PDA's, phones and > > internet-capable toasters...where the overhead of having a compiler > > on board at all (let alone actually running it on anything) tends to > > be unacceptable. > > Which is why I've been focusing on code generation for hardware drivers, > particularly the begin/end functions used in immediate mode rendering. > With hardware T&L, you basically want the non-glVertex* functions to > write directly to the "current" hardware-format vertex, with glVertex > flushing this to a DMA buffer. > > There isn't really much a compiler can do with this: > > struct foo_vertex_o3n3tc2 { > GLfloat obj[3]; > GLfloat normal[3]; > GLfloat tc[2]; > } > > void foo_Normal3fv( const GLfloat *v ) > { > GET_FOO_CONTEXT(fmesa); > COPY_3V( fmesa->current.o3n3t2.normal, v ); > } > > void foo_Vertex3fv( const GLfloat *v ) > { > GET_FOO_CONTEXT(fmesa); > COPY_3V( fmesa->current.o3n3t2.obj, v ); > if ( fmesa->dma.space >= 8 ) { > COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); > fmesa->dma.head += 8; > fmesa->dma.space -= 8; > } else { > fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); > } > } > > (The above is based on code by Keith Whitwell) > > You can, however, substitute most of that with hard-coded addresses for > the current context and make it as streamlined as possible. If you want > to call these functions 10, 30, 100 million times a second, you want > them to be *fast*... I agree with this, but I'm inclined to persue gcc-based codegen, at least as a prototype for a more hard-wired system to follow it. I think we need to make some progress in this area, and gcc looks like it's got a real low entry level. It might be possible to use a tokenized generation language that can either be expanded by the C preprocessor, or understood explicitly by a follow-on bespoke codegen module. Some of the optimizations for the tnl functions like you've got above, such as hardwiring addresses, using the right (ie non -fPIC) compiler options, can be acheived using gcc. So in short, I don't know whether the overhead of gcc will be a problem at runtime, but the low overhead for us right now makes it look like a real attractive way to get started. If it works out ok at runtime, we've finished unexpectedly early. Keith |
From: Gareth H. <ga...@va...> - 2001-03-12 15:07:47
|
Stephen J Baker wrote: > > On 9 Mar 2001, Josh Vanderhoof wrote: > > > Of course, you do have to take a start up penalty with run-time > > compiled code. Considering that the average system is around 700MHz > > (just a guess) and getting faster every day, I think people may be > > overestimating how expensive using a real compiler would be. > > The "average" system with a 700MHz CPU also has a kick-ass graphics > card that makes this discussion largely irrelevent. If software-only > rendering has any kind of a future at all, it's in PDA's, phones and > internet-capable toasters...where the overhead of having a compiler > on board at all (let alone actually running it on anything) tends to > be unacceptable. Which is why I've been focusing on code generation for hardware drivers, particularly the begin/end functions used in immediate mode rendering. With hardware T&L, you basically want the non-glVertex* functions to write directly to the "current" hardware-format vertex, with glVertex flushing this to a DMA buffer. There isn't really much a compiler can do with this: struct foo_vertex_o3n3tc2 { GLfloat obj[3]; GLfloat normal[3]; GLfloat tc[2]; } void foo_Normal3fv( const GLfloat *v ) { GET_FOO_CONTEXT(fmesa); COPY_3V( fmesa->current.o3n3t2.normal, v ); } void foo_Vertex3fv( const GLfloat *v ) { GET_FOO_CONTEXT(fmesa); COPY_3V( fmesa->current.o3n3t2.obj, v ); if ( fmesa->dma.space >= 8 ) { COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); fmesa->dma.head += 8; fmesa->dma.space -= 8; } else { fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); } } (The above is based on code by Keith Whitwell) You can, however, substitute most of that with hard-coded addresses for the current context and make it as streamlined as possible. If you want to call these functions 10, 30, 100 million times a second, you want them to be *fast*... -- Gareth |
From: Stephen J B. <sj...@li...> - 2001-03-12 14:45:30
|
On 9 Mar 2001, Josh Vanderhoof wrote: > Of course, you do have to take a start up penalty with run-time > compiled code. Considering that the average system is around 700MHz > (just a guess) and getting faster every day, I think people may be > overestimating how expensive using a real compiler would be. The "average" system with a 700MHz CPU also has a kick-ass graphics card that makes this discussion largely irrelevent. If software-only rendering has any kind of a future at all, it's in PDA's, phones and internet-capable toasters...where the overhead of having a compiler on board at all (let alone actually running it on anything) tends to be unacceptable. ---- Steve Baker (817)619-2657 (Vox/Vox-Mail) L3Com/Link Simulation & Training (817)619-2466 (Fax) Work: sj...@li... http://www.link.com Home: sjb...@ai... http://web2.airmail.net/sjbaker1 |
From: Josh V. <ho...@na...> - 2001-03-09 20:41:14
|
Gareth Hughes <ga...@va...> writes: > My approach for this is perhaps slightly different to yours. I was > thinking more along the lines of having the compiled functions stored as > strings, which can be copied and edited by the context as needed. This > allows the context to insert hard-coded memory references and so on. > Similarly, I've been kicking around a design of a dynamic software > renderer, which is built from chunks of compiled code that can be > tweaked and chained together depending on the current GL state etc. I > don't think actually "compiling" code is the answer -- it's more a > customization of pre-compiled code to suit the current context. In that situation, compiling the code would help greatly. Here is what you give up by pre-compiling the code: 1. Global optimizations. In the pre-compiled version, there will be an artificial boundary at each chunk. Where the dynamic-compiled version would be free to optimize across chunks, you would be stuck forcing the cpu into a known state at each boundary. 2. Easy processor specific optimizations. If you're compiling at run time, you can get processor specific optimizations by just changing the compiler flags. 3. Portability. The dynamic-compiled version would splice the chunks together automatically. I can't think of a portable way to concatenate pre-compiled code correctly. (Does gcc have an attribute for it?) 4. Flexibility. The pre-compiled code would have to follow a rigid template. If you compile at run time, you have the flexibily to change variable types and structure layouts at run time. Of course, you do have to take a start up penalty with run-time compiled code. Considering that the average system is around 700MHz (just a guess) and getting faster every day, I think people may be overestimating how expensive using a real compiler would be. Josh |
From: Jeff E. <je...@in...> - 2001-03-09 03:56:54
|
[I sent this earlier, but as far as I can tell it bounced rather than being delivered] On Thu, Mar 08, 2001 at 10:39:11AM -0700, Keith Whitwell wrote: > You may want to avoid sharing the nv-specific stuff, but any progress on otf > codegen has lots of application beyond that extension -- I can think of a > dozen uses for something like this. Like Josh Vanderhoof's code, my code writes a file which is intended to be fed to the compiler. This is good enough for a proof-of-concept, but not good enough for the real world, as observed in a later post by Steve Baker, at least without some way to explicitly ask for the code you need before you start doing the actual rendering. A real, no-compiler, no-assembler target for texturing on MMX would be a fun week-or-two project[*], but portability would be very nearly zero, and it's not clear how much people care about software texturing these days anyhow. A project might use the GNU Lightning "portable assembler" JIT, but it would probably not be much faster than the current code since it makes such minimal assumptions about machine architecture (6 32-bit integer registers, for instance) so it can be portable, rather than generate blindingly-fast code. Jeff * And another year making it "right" in all the corner cases |