From: Josh V. <ho...@na...> - 2001-03-12 23:30:41
|
Gareth Hughes <ga...@va...> writes: > struct foo_vertex_o3n3tc2 { > GLfloat obj[3]; > GLfloat normal[3]; > GLfloat tc[2]; > } > > void foo_Normal3fv( const GLfloat *v ) > { > GET_FOO_CONTEXT(fmesa); > COPY_3V( fmesa->current.o3n3t2.normal, v ); > } > > void foo_Vertex3fv( const GLfloat *v ) > { > GET_FOO_CONTEXT(fmesa); > COPY_3V( fmesa->current.o3n3t2.obj, v ); > if ( fmesa->dma.space >= 8 ) { > COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); > fmesa->dma.head += 8; > fmesa->dma.space -= 8; > } else { > fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); > } > } > > (The above is based on code by Keith Whitwell) > > You can, however, substitute most of that with hard-coded addresses for > the current context and make it as streamlined as possible. If you want > to call these functions 10, 30, 100 million times a second, you want > them to be *fast*... I'm getting sidetracked here but: 1. Before you start doing crazy optimizations why wouldn't you rewrite it like this (get rid of the "dma.space" variable): void foo_Vertex3fv( const GLfloat *v ) { GET_FOO_CONTEXT(fmesa); COPY_3V( fmesa->current.o3n3t2.obj, v ); if ( fmesa->dma.head + 8 <= fmesa->dma.end_of_space ) { COPY_DWORDS( fmesa->dma.head, fmesa->current.o3n3tc2, 8 ); fmesa->dma.head += 8; } else { fmesa->get_dma( fmesa, fmesa->current.o3n3tc2, 8 ); } } 2. Hard-coding the address would help you, but not by very much. I would expect it to save you one MOV instruction. On Intel cpu's, the "reg+offset" addressing mode is "free". (You probably knew that already.) 3. If you want to go all out on this code, you could probably use mprotect() to avoid the buffer overflow test entirely. That would only be a good idea if buffer overflows are rare though. Josh |