From: martin k. <blu...@gm...> - 2009-04-28 17:46:58
|
Thank you, Roland, for the prompt reply. 'R200_DEBUG=fall' did not reveal any sw fallbacks, so my inital assumption of a potential TCL such occuring appears to be wrong. On the other hand, 'R200_DEBUG=all' showed something interesting. For a sequence of identical STATIC_DRAW-kind VBO-carried draws, of the form: for (unsigned i = 0; i < draw_iterations; i++) glDrawElements(GL_TRIANGLES, num_indices, GL_UNSIGNED_SHORT, 0); where the vertex/index buffers have been created as: glGenBuffersARB(2, vboId); glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboId[0]); glBufferDataARB(GL_ARRAY_BUFFER_ARB, vert_arr_size, vert_arr, GL_STATIC_DRAW_ARB); glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, vboId[1]); glBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, idx_arr_size, idx_arr, GL_STATIC_DRAW_ARB); glVertexPointer(3, GL_FLOAT, sizeof_vertex, offs_pos); glNormalPointer( GL_FLOAT, sizeof_vertex, offs_nrm); glEnableClientState(GL_VERTEX_ARRAY); glEnableClientState(GL_NORMAL_ARRAY); 'R200_DEBUG=all' reports that the number of r200AllocDmaRegion() invocations scales with the value of draw_iterations from the draw snippet. Now, given that we are dealing with a single, supposedly static vertex buffer here, i find this number-of-draws-proportial DMA handling quite curious. What am i missing? Best regards, martin ps: since you mentioned the possibility of potential unpotimized paths in the DRI edge: the performance issues i've been referring to are strictly TCL-related. In all other aspects this DRI edge performs up to the hw specs. On Mon, Apr 27, 2009 at 12:58 PM, Roland Scheidegger <sr...@vm...> wrote: > I'm not aware of any problems with hw TCL or ARB_vp on any r200 chip > (and it shouldn't matter if pci or agp for this). You can use > R200_DEBUG=fall env var to see if the code hits any fallbacks. > Note that there are however cases the code is not optimized for and > might get excessively slow (like doing CopyTexImage which isn't hardware > accelerated). > > Roland > |