Re: [Mesa3d-dev] dynamic code generation with gcc

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Gareth Hughes wrote:
> 
> Stephen J Baker wrote:
> >
> > On 9 Mar 2001, Josh Vanderhoof wrote:
> >
> > > Of course, you do have to take a start up penalty with run-time
> > > compiled code.  Considering that the average system is around 700MHz
> > > (just a guess) and getting faster every day, I think people may be
> > > overestimating how expensive using a real compiler would be.
> >
> > The "average" system with a 700MHz CPU also has a kick-ass graphics
> > card that makes this discussion largely irrelevent.  If software-only
> > rendering has any kind of a future at all, it's in PDA's, phones and
> > internet-capable toasters...where the overhead of having a compiler
> > on board at all (let alone actually running it on anything) tends to
> > be unacceptable.
> 
> Which is why I've been focusing on code generation for hardware drivers,
> particularly the begin/end functions used in immediate mode rendering.
> With hardware T&L, you basically want the non-glVertex* functions to
> write directly to the "current" hardware-format vertex, with glVertex
> flushing this to a DMA buffer.
> 
> There isn't really much a compiler can do with this:
> 
> struct foo_vertex_o3n3tc2 {
>    GLfloat obj[3];
>    GLfloat normal[3];
>    GLfloat tc[2];
> }
> 
> void foo_Normal3fv( const GLfloat *v )
> {
>    GET_FOO_CONTEXT(fmesa);
>    COPY_3V( fmesa->current.o3n3t2.normal, v );
> }
> 
> void foo_Vertex3fv( const GLfloat *v )
> {
>    GET_FOO_CONTEXT(fmesa);
>    COPY_3V( fmesa->current.o3n3t2.obj, v );
>    if ( fmesa->dma.space >= 8 ) {
>       COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 );
>       fmesa->dma.head += 8;
>       fmesa->dma.space -= 8;
>    } else {
>       fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 );
>    }
> }
> 
> (The above is based on code by Keith Whitwell)
> 
> You can, however, substitute most of that with hard-coded addresses for
> the current context and make it as streamlined as possible.  If you want
> to call these functions 10, 30, 100 million times a second, you want
> them to be *fast*...

I agree with this, but I'm inclined to persue gcc-based codegen, at least as a
prototype for a more hard-wired system to follow it.  I think we need to make
some progress in this area, and gcc looks like it's got a real low entry
level.

It might be possible to use a tokenized generation language that can either be
expanded by the C preprocessor, or understood explicitly by a follow-on
bespoke codegen module.

Some of the optimizations for the tnl functions like you've got above, such as
hardwiring addresses, using the right (ie non -fPIC) compiler options, can be
acheived using gcc.

So in short, I don't know whether the overhead of gcc will be a problem at
runtime, but the low overhead for us right now makes it look like a real
attractive way to get started.  If it works out ok at runtime, we've finished
unexpectedly early.

Keith