From: Keith W. <ke...@va...> - 2001-03-12 21:47:11
|
Gareth Hughes wrote: > > Stephen J Baker wrote: > > > > On 9 Mar 2001, Josh Vanderhoof wrote: > > > > > Of course, you do have to take a start up penalty with run-time > > > compiled code. Considering that the average system is around 700MHz > > > (just a guess) and getting faster every day, I think people may be > > > overestimating how expensive using a real compiler would be. > > > > The "average" system with a 700MHz CPU also has a kick-ass graphics > > card that makes this discussion largely irrelevent. If software-only > > rendering has any kind of a future at all, it's in PDA's, phones and > > internet-capable toasters...where the overhead of having a compiler > > on board at all (let alone actually running it on anything) tends to > > be unacceptable. > > Which is why I've been focusing on code generation for hardware drivers, > particularly the begin/end functions used in immediate mode rendering. > With hardware T&L, you basically want the non-glVertex* functions to > write directly to the "current" hardware-format vertex, with glVertex > flushing this to a DMA buffer. > > There isn't really much a compiler can do with this: > > struct foo_vertex_o3n3tc2 { > GLfloat obj[3]; > GLfloat normal[3]; > GLfloat tc[2]; > } > > void foo_Normal3fv( const GLfloat *v ) > { > GET_FOO_CONTEXT(fmesa); > COPY_3V( fmesa->current.o3n3t2.normal, v ); > } > > void foo_Vertex3fv( const GLfloat *v ) > { > GET_FOO_CONTEXT(fmesa); > COPY_3V( fmesa->current.o3n3t2.obj, v ); > if ( fmesa->dma.space >= 8 ) { > COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); > fmesa->dma.head += 8; > fmesa->dma.space -= 8; > } else { > fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); > } > } > > (The above is based on code by Keith Whitwell) > > You can, however, substitute most of that with hard-coded addresses for > the current context and make it as streamlined as possible. If you want > to call these functions 10, 30, 100 million times a second, you want > them to be *fast*... I agree with this, but I'm inclined to persue gcc-based codegen, at least as a prototype for a more hard-wired system to follow it. I think we need to make some progress in this area, and gcc looks like it's got a real low entry level. It might be possible to use a tokenized generation language that can either be expanded by the C preprocessor, or understood explicitly by a follow-on bespoke codegen module. Some of the optimizations for the tnl functions like you've got above, such as hardwiring addresses, using the right (ie non -fPIC) compiler options, can be acheived using gcc. So in short, I don't know whether the overhead of gcc will be a problem at runtime, but the low overhead for us right now makes it look like a real attractive way to get started. If it works out ok at runtime, we've finished unexpectedly early. Keith |