From: Brian P. <bri...@tu...> - 2006-09-05 18:14:29
|
James Supancic wrote: > I have a dual head ATI system (One GPU, two Render SPUs), which > performs very poorly. After much profiling I have determined that it > is spending a lot of time (near 99.4%) in fglrx.so. I find that the > functions in this library are mostly invoked by functions related to > the processing of unrolled glDrawElemnts commands (glArrayElement > calls are causing this CPU usage). I wrote a simple test application. > My results show poor performance by the ATI GPU when two concurrent > processes use VBOs and glArrayElement to draw objects. > > As I only need to be rendering to one of the ATI heads at a time, I > think a possible solution is to filter out unneeded glDrawElements > commands. > > This could be done by checking the rendering window rectangle against > the rectangle of each monitor. If the rectangles intersect, we would > set the Pack Buffer to thread->buffer[current_server] and then do what > we normally do to translate and pack the command for that server. > Repeat for each server. When done, set the Pack buffer back to > thread->geometry_buffer (This is what it was before, right)? This > would prevent the glDrawElement command from effecting servers that do > not have the GL rendering window on them. The tilesort SPU broadcasts VBO drawing commands to all crservers. The tilesort SPU's state tracker keeps a client-side copy of the VBO data but does not analyze VBO drawing commands to compute the bounding box (which would be used for bucketing). The cost of computing the bounding boxes in these cases could be more than just broadcasting the command. If you want to optimize things, you'll have to add new glDrawArrays/glDrawElements code to the tilesort SPU that computes bounding boxes. Unfortunately, you can't just look at a VBO to determine bounds since there's no way to interpret the VBO's data; you need the glVertexArray parameters, etc. which can vary from one draw to the next. > Is dropping glDrawElements commands for render SPUs who's monitors > don't intersect the OpenGL output window acceptable practice? Will it > cause problems for downstream SPUs? > > What is the best method to integrate such optimizations into Chromium? > > I am only aware of 2 types of pack buffers in the tilesort spu, the > geometry_buffer, and the server specific buffers. Are there any others > I should know about? Is your application putting its array indices into a GL_ELEMENT_ARRAY_BUFFER VBO? To get best performance, you want both your vertex data and indices to be in VBOs. -Brian |