From: Stephen J B. <sj...@li...> - 2001-03-15 15:22:37
|
On 13 Mar 2001, Josh Vanderhoof wrote: > "Stephen J Baker" <sj...@li...> writes: > > > If it's for something like varying the Z buffer depth - then a large > > overhead might be acceptable (although I still maintain that creating > > a dependence on a C compiler is a disasterously bad decision). > > I just want to make sure there isn't a misunderstanding here - the C > compiler would only be used to compile special case functions. There > would always be a generic fallback, so Mesa would work (although more > slowly) even without a compiler installed. Yes - but (as an application author) I have to tell you that well over half the support problems I have for my code are Mesa-related and many of them are of the form "The program runs really slowly on this machine and much faster on that one...why?" - and this kind of nonsense makes that kind of thing MUCH worse. You are going to run into all kinds of practical problem - where the compiler isn't set up right - or it comes up with unexpected messages to stdout that mess up people who pipe the output of their programs off somewhere else. Some people have C compilers - but not gcc. There are just too many variables to make this work in a stable manner. > > If it's something like catering for a particular combination of (say) > > blend mode, texture type, dither and fogging (the kind of thing that > > the SGI OpenGL-for-Windoze appears to do) - then taking even a couple > > of milliseconds at runtime when you first happen to see a polygon with > > that combination of states would be completely unacceptable - even with > > caching. > > If it was just 20 milliseconds or so, I doubt most users would notice. > The problem is, I can easily see the delay being 500 ms for a big > function. > > I can think of two workarounds: > > 1. Make the cache persistent, then the delay would only happen the > first time you run your application. > > 2. Run the compiler in the background at low priority (say 10% cpu). > You would have to make do with the generic fallback until the > compile completes, but there wouldn't be a 'hitch' when the > application exposes a different path. > > Do you think that would be acceptable? Well, I have to say that I don't (yet) make much use of software rendering - but in my business (Flight Simulation), the worst case frame time is all that matters. Occasional glitches in frame rate are just as bad (from our customer's point of view) as persistantly bad frame times. However, we only use hardware accellerated OpenGL - so it's not an issue in this *specific* case. SGI's OpenGL-for-Windoze can do this without using a compiler - so should we. ---- Steve Baker (817)619-2657 (Vox/Vox-Mail) L3Com/Link Simulation & Training (817)619-2466 (Fax) Work: sj...@li... http://www.link.com Home: sjb...@ai... http://web2.airmail.net/sjbaker1 |
From: Gareth H. <ga...@va...> - 2001-03-15 23:49:32
|
Stephen J Baker wrote: > > SGI's OpenGL-for-Windoze can do this without using a compiler - so > should we. Which is why I was going down that road... And quite frankly, a lot of the stuff I would want to "generate" would be 3DNow!, SSE, MMX, maybe even x86 hand-optimized assembly. Thus, using a compiler seems fairly pointless. -- Gareth |
From: Keith W. <ke...@va...> - 2001-03-13 17:17:04
|
Josh Vanderhoof wrote: > > Gareth Hughes <ga...@va...> writes: > > > Compiling this with the code Josh sent takes around 70 msec on my 700MHz > > PIII laptop. > > > > void codegen_test( void ) > > { > > printf( "hello, world!\n" ); > > } > > > > That's a long time at 60fps... > > I would think you would keep a cache of the stuff that you've compiled > to avoid re-compiling every time you want to call the routine. So > you're really looking at a one time start up cost. Is there something > where the code would change on every single frame? Yes. I think you'd probably see less than 10 compiles at application startup and then no more. If applications behave in a wierd morphing way, just pull the plug on compilation and let them use the generic code. Keith |
From: Keith W. <ke...@va...> - 2001-03-13 17:11:50
|
Gareth Hughes wrote: > > Josh Vanderhoof wrote: > > > > I'm getting sidetracked here but: > > > > 1. Before you start doing crazy optimizations why wouldn't you rewrite > > it like this (get rid of the "dma.space" variable): > > > > void foo_Vertex3fv( const GLfloat *v ) > > { > > GET_FOO_CONTEXT(fmesa); > > COPY_3V( fmesa->current.o3n3t2.obj, v ); > > if ( fmesa->dma.head + 8 <= fmesa->dma.end_of_space ) { > > COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); > > fmesa->dma.head += 8; > > } else { > > fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); > > } > > } > > Sure -- it was just a cut and paste of some old code Keith sent me. > Minor point. > > > 2. Hard-coding the address would help you, but not by very much. I > > would expect it to save you one MOV instruction. On Intel cpu's, > > the "reg+offset" addressing mode is "free". (You probably knew > > that already.) > > Yep :-) GET_FOO_CONTEXT() is the big one, and perhaps I should have > stressed that a little more. This may involve a function call to > determine the current context, due to thread-safety issues. > > > 3. If you want to go all out on this code, you could probably use > > mprotect() to avoid the buffer overflow test entirely. That would > > only be a good idea if buffer overflows are rare though. > > You need buffer overflows as they end up flushing the DMA buffer. In > this case, get_dma() would flush the current buffer and acquire a new > one. But of course, we should never be getting rid of or aquiring new dma buffers; we should be just aging regions within a buffer we always hold, and perhaps in rare cases growing/shrinking it. However, mprotect doesn't help you here either, as the regions being aged are pretty dynamic. Keith |
From: Gareth H. <ga...@va...> - 2001-03-13 17:32:39
|
Keith Whitwell wrote: > > But of course, we should never be getting rid of or aquiring new dma buffers; > we should be just aging regions within a buffer we always hold, and perhaps in > rare cases growing/shrinking it. > > However, mprotect doesn't help you here either, as the regions being aged are > pretty dynamic. If only I could sit down and finish my dynamic heap code... :-) My plan is to basically have a pool of 2-4 MB DMA regions, with each one assigned to at most one context. The context handles all allocation within the buffer, and the regions are aged by the kernel when they are flushed. The flush is the only thing that you need to go into the kernel for, and I can't really see any way around that. (Note: current status is working allocation code in a single region, need to finish off multiple regions, switching regions between contexts and the like. The new SAREA code will allow a finer grain of control over the region, as we're currently limited to a 4K control page). If you were lucky enough to fill your region, your next LOCK_DMA_REGION() would automagically grab you another one. I think that's easier than trying to resize the current one. Grabbing another region may result in allocating and binding more AGP memory, but that's a nice and transparent side effect. -- Gareth |
From: Keith W. <ke...@va...> - 2001-03-13 17:14:33
|
Gareth Hughes wrote: > > Josh Vanderhoof wrote: > > > > I'm getting sidetracked here but: > > > > 1. Before you start doing crazy optimizations why wouldn't you rewrite > > it like this (get rid of the "dma.space" variable): > > > > void foo_Vertex3fv( const GLfloat *v ) > > { > > GET_FOO_CONTEXT(fmesa); > > COPY_3V( fmesa->current.o3n3t2.obj, v ); > > if ( fmesa->dma.head + 8 <= fmesa->dma.end_of_space ) { > > COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); > > fmesa->dma.head += 8; > > } else { > > fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); > > } > > } > > Sure -- it was just a cut and paste of some old code Keith sent me. > Minor point. Historically it's because it is fmesa->dma->head not fmesa->dma.head. It's an ugliness that was built into the drivers early on. Keith |
From: Josh V. <ho...@na...> - 2001-03-12 23:30:30
|
"Stephen J Baker" <sj...@li...> writes: > The "average" system with a 700MHz CPU also has a kick-ass graphics > card that makes this discussion largely irrelevent. If software-only > rendering has any kind of a future at all, it's in PDA's, phones and > internet-capable toasters...where the overhead of having a compiler > on board at all (let alone actually running it on anything) tends to > be unacceptable. Gareth brought up software rendering. While I think Gareth's dynamic software renderer design could benefit from compiling at run time, I was thinking more about functions that are currently implemented with template header files. They are usually named "*tmp.h" and are used in both hardware and software rendering. If Mesa could compile new functions at run time, you could cover more special cases without bloating the library with functions that are never called. As for the toaster running Mesa, you might find out that the Mesa+compiler package is SMALLER than normal Mesa. Removing all the special case template functions could reduce the size of Mesa enough to offset the size of a compiler. Josh |
From: Gareth H. <ga...@va...> - 2001-03-13 00:28:45
|
Josh Vanderhoof wrote: > > Gareth brought up software rendering. While I think Gareth's dynamic > software renderer design could benefit from compiling at run time, I > was thinking more about functions that are currently implemented with > template header files. They are usually named "*tmp.h" and are used > in both hardware and software rendering. If Mesa could compile new > functions at run time, you could cover more special cases without > bloating the library with functions that are never called. To tell you the truth, I hadn't really thought about the implementation details of my software renderer enough. I certainly agree that there are benefits to using compilation, but this is a case where the cost of doing so would need to be investigated. One day when I have the time to actually write the code, we'll see what happens... -- Gareth |
From: Jeff E. <je...@in...> - 2001-03-09 03:56:54
|
[I sent this earlier, but as far as I can tell it bounced rather than being delivered] On Thu, Mar 08, 2001 at 10:39:11AM -0700, Keith Whitwell wrote: > You may want to avoid sharing the nv-specific stuff, but any progress on otf > codegen has lots of application beyond that extension -- I can think of a > dozen uses for something like this. Like Josh Vanderhoof's code, my code writes a file which is intended to be fed to the compiler. This is good enough for a proof-of-concept, but not good enough for the real world, as observed in a later post by Steve Baker, at least without some way to explicitly ask for the code you need before you start doing the actual rendering. A real, no-compiler, no-assembler target for texturing on MMX would be a fun week-or-two project[*], but portability would be very nearly zero, and it's not clear how much people care about software texturing these days anyhow. A project might use the GNU Lightning "portable assembler" JIT, but it would probably not be much faster than the current code since it makes such minimal assumptions about machine architecture (6 32-bit integer registers, for instance) so it can be portable, rather than generate blindingly-fast code. Jeff * And another year making it "right" in all the corner cases |