From: Roland S. <rsc...@hi...> - 2005-02-09 19:58:15
|
Some more numbers, this time from a 9000Pro (64MB). In contrast to the quite slow 7200sdr with only 2.6GB/s ram, this one has 8.8GB/s bandwidth (128bit/275Mhz DDR). Not to mention the chip is certainly faster too. Test sytem is also faster though, A64 3000+ socket 754, 3.2GB/s system memory bandwidth. Desktop resolution 1280x1024, quake3 windowed 1024x768. The hacks required to disable the heaps are exactly the same as those used on r100 (except of course the nr_heaps assertion had to go..., yes this hack DOES break the client stuff). Local texture size is 35MB, unless otherwise noted (I just changed the allocation scheme in the ddx driver, so only 1 framebuffer worth of pixmap cache is used instead of 3, btw without really any noticeable impact on 2d performance). GART size is 32MB unless specifically stated. Desktop resolution 1280x1024, quake3 windowed 1024x768. AGP 4x, local: 125 fps AGP 4x, both: 80-123 fps AGP 4x, gart only: 68 fps AGP 1x, local: 115 fps AGP 1x, gart only: 21 fps Some rtcw (demo checkpoint) results too (fullscreen 1024x768). AGP 4x, local: 70 fps AGP 4x, local 45MB: 85 fps AGP 4x, both: 62-77 fps AGP 4x, both, gart 64MB: 58-68 fps AGP 4x, gart only, 64MB: 47 fps AGP 1x, local: 56 fps AGP 1x, gart only: 14 fps texdown AGP 4x gart: 230MB/s texdown AGP 4x local: 650MB/s! texdown AGP 1x gart: 117MB/s before q3, 89MB/s after q3 (?) texdown AGP 1x local: 265MB/s!!! There seemed to be a problem with gart texturing and AGP modes lower than 4x, agpgart reported "Putting AGP V2 device at 0000:00:00.0 into 0x mode", glxinfo still reported AGP 1x and 2x, respectively. 1x and 2x results were identical, and to put it simply, the results downright appalling. This may be a problem with agpgart (using the version from kernel 2.6.10). However, I was amazed at the texdown performance to local graphics memory, as it's VERY close to the theoretical limit. texdown performance with AGP 4x was also quite good. the rtcw checkpoint demo exceeds "in-use" texture size of 35MB, that's why I've put in some results with larger local texture size (as well as increased the gart size). 45MB is enough though, with 35MB you'd get some occasional drops to around 12fps (and 6fps with agp 1x), these are completely gone with 45MB. Performance with gart texturing, even in 4x mode, takes a big hit (almost 50%). I was not really able to get consistent performance results when both texture heaps were active, I guess it's luck of the day which textures got put in the gart heap and which ones in the local heap. But that performance indeed got faster with a smaller gart heap is not a good sign. And even if the maximum obtained in rtcw with 35MB local heap and 29MB gart heap was higher than the score obtained with 35MB local heap alone, there were clearly areas which ran faster with only the local heap. It seems to me that the allocator really should try harder to use the local heap to be useful on r200 cards, moreover it is likely that you'd get quite a bit better performance when you DO have to put textures into the gart heap when you revisit that later when more space becomes available on the local heap and upload the still-used textures from the gart heap to the local heap (in fact, should be even faster than those 650MB/s, since no in-kernel-copy would be needed, it should be possible to blit it directly). Some numbers just for fun, since those are the numbers everyone wants to see... Some other OS, rtcw: 120 fps Some other OS, q3: 137 fps (this one is a bit cheated. I'm pretty sure non-fullscreen does not use pageflip. Fullscreen score was 174 fps, whereas we only improved from 125 fps to 129 fps...) This ain't that bad. I'd be happy if we'd do that well in say, ut2k4 or doom3... Roland |