Thread: [Chromium-dev] Dropping unneeded glDrawElements calls?

Status: Abandoned

Brought to you by: ahern, brianp, humper, randall-frank

chromium-dev

[Chromium-dev] Dropping unneeded glDrawElements calls?

From: James S. <arr...@gm...> - 2006-09-04 22:05:32

I have a dual head ATI system (One GPU, two Render SPUs), which
performs very poorly. After much profiling I have determined that it
is spending a lot of time (near 99.4%) in fglrx.so. I find that the
functions in this library are mostly invoked by functions related to
the processing of unrolled glDrawElemnts commands (glArrayElement
calls are causing this CPU usage). I wrote a simple test application.
My results show poor performance by the ATI GPU when two concurrent
processes use VBOs and glArrayElement to draw objects.

As I only need to be rendering to one of the ATI heads at a time, I
think a possible solution is to filter out unneeded glDrawElements
commands.

This could be done by checking the rendering window rectangle against
the rectangle of each monitor. If the rectangles intersect, we would
set the Pack Buffer to thread->buffer[current_server] and then do what
we normally do to translate and pack the command for that server.
Repeat for each server. When done, set the Pack buffer back to
thread->geometry_buffer (This is what it was before, right)? This
would prevent the glDrawElement command from effecting servers that do
not have the GL rendering window on them.

Is dropping glDrawElements commands for render SPUs who's monitors
don't intersect the OpenGL output window acceptable practice? Will it
cause problems for downstream SPUs?

What is the best method to integrate such optimizations into Chromium?

I am only aware of 2 types of pack buffers in the tilesort spu, the
geometry_buffer, and the server specific buffers. Are there any others
I should know about?

Thank you for your time,
James Steven Supancic III

Re: [Chromium-dev] Dropping unneeded glDrawElements calls?

From: Brian P. <bri...@tu...> - 2006-09-05 18:14:29

James Supancic wrote:
> I have a dual head ATI system (One GPU, two Render SPUs), which
> performs very poorly. After much profiling I have determined that it
> is spending a lot of time (near 99.4%) in fglrx.so. I find that the
> functions in this library are mostly invoked by functions related to
> the processing of unrolled glDrawElemnts commands (glArrayElement
> calls are causing this CPU usage). I wrote a simple test application.
> My results show poor performance by the ATI GPU when two concurrent
> processes use VBOs and glArrayElement to draw objects.
> 
> As I only need to be rendering to one of the ATI heads at a time, I
> think a possible solution is to filter out unneeded glDrawElements
> commands.
> 
> This could be done by checking the rendering window rectangle against
> the rectangle of each monitor. If the rectangles intersect, we would
> set the Pack Buffer to thread->buffer[current_server] and then do what
> we normally do to translate and pack the command for that server.
> Repeat for each server. When done, set the Pack buffer back to
> thread->geometry_buffer (This is what it was before, right)? This
> would prevent the glDrawElement command from effecting servers that do
> not have the GL rendering window on them.

The tilesort SPU broadcasts VBO drawing commands to all crservers. 
The tilesort SPU's state tracker keeps a client-side copy of the VBO 
data but does not analyze VBO drawing commands to compute the bounding 
box (which would be used for bucketing).

The cost of computing the bounding boxes in these cases could be more 
than just broadcasting the command.

If you want to optimize things, you'll have to add new 
glDrawArrays/glDrawElements code to the tilesort SPU that computes 
bounding boxes.

Unfortunately, you can't just look at a VBO to determine bounds since 
there's no way to interpret the VBO's data; you need the glVertexArray 
parameters, etc. which can vary from one draw to the next.

> Is dropping glDrawElements commands for render SPUs who's monitors
> don't intersect the OpenGL output window acceptable practice? Will it
> cause problems for downstream SPUs?
> 
> What is the best method to integrate such optimizations into Chromium?
> 
> I am only aware of 2 types of pack buffers in the tilesort spu, the
> geometry_buffer, and the server specific buffers. Are there any others
> I should know about?

Is your application putting its array indices into a 
GL_ELEMENT_ARRAY_BUFFER VBO?  To get best performance, you want both 
your vertex data and indices to be in VBOs.

-Brian

Re: [Chromium-dev] Dropping unneeded glDrawElements calls?

From: James S. <arr...@gm...> - 2006-09-08 05:07:44

> The tilesort SPU broadcasts VBO drawing commands to all crservers.
> The tilesort SPU's state tracker keeps a client-side copy of the VBO
> data but does not analyze VBO drawing commands to compute the bounding
> box (which would be used for bucketing).
>
> The cost of computing the bounding boxes in these cases could be more
> than just broadcasting the command.
>
> If you want to optimize things, you'll have to add new
> glDrawArrays/glDrawElements code to the tilesort SPU that computes
> bounding boxes.
>
> Unfortunately, you can't just look at a VBO to determine bounds since
> there's no way to interpret the VBO's data; you need the glVertexArray
> parameters, etc. which can vary from one draw to the next.

I am not talking about anything that advanced. Most of the time the
rendering window will not be on both of the ATI monitors. Methinks a
simple way to filter out unneeded glDraw* commands is to check:
(thread->currentContext->currentWindow->server+index)->num_extents
If the server does not have any extents, than it isn't really doing
much? We should be able to drop a lot of commands?

I tried checking this in a for loop, and using
crPackSetBuffer(thread->packer,&(thread->buffer[index]));
when it did not equal zero and I set it back to what it was before
after the loop ended.

I think I am now putting the unrolled draw elements call into the
server specific buffers?
Performance has gone up a lot, but there are some strange visual
errors. It looks as if some coordinate data is being truncated?


Obviously, putting the data into the server specific buffer doesn't
have the same effect as putting it into the global buffer. What is the
correct way to send a command to a single server?

I think the thread->buffer buffers are for the exclusive use of the
state tracker?

Should I try to find a way to use the state tracker for sending the
data to the servers as needed? Or maybe add new buffers and add code
to the flush mechanism for this purpose?

Thank you for your time,
James Steven Supancic III

Re: [Chromium-dev] Dropping unneeded glDrawElements calls?

From: Brian P. <bri...@tu...> - 2006-09-08 20:08:03

James Supancic wrote:
>> The tilesort SPU broadcasts VBO drawing commands to all crservers.
>> The tilesort SPU's state tracker keeps a client-side copy of the VBO
>> data but does not analyze VBO drawing commands to compute the bounding
>> box (which would be used for bucketing).
>>
>> The cost of computing the bounding boxes in these cases could be more
>> than just broadcasting the command.
>>
>> If you want to optimize things, you'll have to add new
>> glDrawArrays/glDrawElements code to the tilesort SPU that computes
>> bounding boxes.
>>
>> Unfortunately, you can't just look at a VBO to determine bounds since
>> there's no way to interpret the VBO's data; you need the glVertexArray
>> parameters, etc. which can vary from one draw to the next.
> 
> 
> I am not talking about anything that advanced. Most of the time the
> rendering window will not be on both of the ATI monitors. Methinks a
> simple way to filter out unneeded glDraw* commands is to check:
> (thread->currentContext->currentWindow->server+index)->num_extents
> If the server does not have any extents, than it isn't really doing
> much? We should be able to drop a lot of commands?
> 
> I tried checking this in a for loop, and using
> crPackSetBuffer(thread->packer,&(thread->buffer[index]));
> when it did not equal zero and I set it back to what it was before
> after the loop ended.
> 
> I think I am now putting the unrolled draw elements call into the
> server specific buffers?
> Performance has gone up a lot, but there are some strange visual
> errors. It looks as if some coordinate data is being truncated?
> 
> 
> Obviously, putting the data into the server specific buffer doesn't
> have the same effect as putting it into the global buffer. What is the
> correct way to send a command to a single server?
> 
> I think the thread->buffer buffers are for the exclusive use of the
> state tracker?
> 
> Should I try to find a way to use the state tracker for sending the
> data to the servers as needed? Or maybe add new buffers and add code
> to the flush mechanism for this purpose?

I think the best place to plug in this features is right after the 
bucketing stage.  The bucketing stage looks at bounding boxes to 
determine which geometry buffers go to each crserver.

The tilesortspuBucketGeometry() function produces a bitmask indicating 
which crservers need the geometry.

You'll need to add something like this:

for (i = 0; i < num servers; i++)
	if (server[i].extents are null)
		bucketInfo->hits[i / 32] &= ~(1 << (i % 32));


-Brian