Thread: texturing performance local/gart on rv250

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Some more numbers, this time from a 9000Pro (64MB). In contrast to the 
quite slow 7200sdr with only 2.6GB/s ram, this one has 8.8GB/s bandwidth 
(128bit/275Mhz DDR). Not to mention the chip is certainly faster too.
Test sytem is also faster though, A64 3000+ socket 754, 3.2GB/s system 
memory bandwidth.
Desktop resolution 1280x1024, quake3 windowed 1024x768.
The hacks required to disable the heaps are exactly the same as those 
used on r100 (except of course the nr_heaps assertion had to go..., yes 
this hack DOES break the client stuff).
Local texture size is 35MB, unless otherwise noted (I just changed the 
allocation scheme in the ddx driver, so only 1 framebuffer worth of 
pixmap cache is used instead of 3, btw without really any noticeable 
impact on 2d performance). GART size is 32MB unless specifically stated.

Desktop resolution 1280x1024, quake3 windowed 1024x768.

AGP 4x, local:     125 fps
AGP 4x, both:   80-123 fps
AGP 4x, gart only:  68 fps
AGP 1x, local:     115 fps
AGP 1x, gart only:  21 fps

Some rtcw (demo checkpoint) results too (fullscreen 1024x768).
AGP 4x, local:              70 fps
AGP 4x, local 45MB:         85 fps
AGP 4x, both:            62-77 fps
AGP 4x, both, gart 64MB: 58-68 fps
AGP 4x, gart only, 64MB:    47 fps
AGP 1x, local:              56 fps
AGP 1x, gart only:          14 fps

texdown AGP 4x gart: 230MB/s
texdown AGP 4x local: 650MB/s!
texdown AGP 1x gart: 117MB/s before q3, 89MB/s after q3 (?)
texdown AGP 1x local: 265MB/s!!!

There seemed to be a problem with gart texturing and AGP modes lower 
than 4x, agpgart reported "Putting AGP V2 device at 0000:00:00.0 into 0x 
mode", glxinfo still reported AGP 1x and 2x, respectively. 1x and 2x 
results were identical, and to put it simply, the results downright 
appalling. This may be a problem with agpgart (using the version from 
kernel 2.6.10). However, I was amazed at the texdown performance to 
local graphics memory, as it's VERY close to the theoretical limit. 
texdown performance with AGP 4x was also quite good.

the rtcw checkpoint demo exceeds "in-use" texture size of 35MB, that's 
why I've put in some results with larger local texture size (as well as 
increased the gart size). 45MB is enough though, with 35MB you'd get 
some occasional drops to around 12fps (and 6fps with agp 1x), these are 
completely gone with 45MB.
Performance with gart texturing, even in 4x mode, takes a big hit 
(almost 50%).
I was not really able to get consistent performance results when both 
texture heaps were active, I guess it's luck of the day which textures 
got put in the gart heap and which ones in the local heap. But that 
performance indeed got faster with a smaller gart heap is not a good 
sign. And even if the maximum obtained in rtcw with 35MB local heap and 
29MB gart heap was higher than the score obtained with 35MB local heap 
alone, there were clearly areas which ran faster with only the local heap.
It seems to me that the allocator really should try harder to use the 
local heap to be useful on r200 cards, moreover it is likely that you'd 
get quite a bit better performance when you DO have to put textures into 
the gart heap when you revisit that later when more space becomes 
available on the local heap and upload the still-used textures from the 
gart heap to the local heap (in fact, should be even faster than those 
650MB/s, since no in-kernel-copy would be needed, it should be possible 
to blit it directly).

Some numbers just for fun, since those are the numbers everyone wants to 
see...
Some other OS, rtcw: 120 fps
Some other OS, q3: 137 fps (this one is a bit cheated. I'm pretty sure 
non-fullscreen does not use pageflip. Fullscreen score was 174 fps, 
whereas we only improved from 125 fps to 129 fps...)
This ain't that bad. I'd be happy if we'd do that well in say, ut2k4 or 
doom3...

Roland

Thread: texturing performance local/gart on rv250

dri-devel