From: Mike H. <mho...@gr...> - 2004-02-21 00:03:30
|
Alan Matsuoka wrote: >Lately I've been doing some profiling with oprofile and I've done a >couple of optimizations to speed up the packer. Believe it or not, the >major culprit was in crPackNullCurrentPointers where about %90 of the >time was spent setting individual structure fields to 0. Doing a single >memset made it insignificant. > > Is this patch checked in? >Here's something from running the tunnel2 demo that was in Mesa. I'm >currently using a Quadro FX with a merely mortal 800Mhz P3. > >Cpu type: PIII >Cpu speed was (MHz estimation) : 797.433 >Counter 0 counted CPU_CLK_UNHALTED events (clocks processor is not >halted) with a unit mask of 0x00 (Not set) count 600000 >vma samples %-age symbol name >0006b2c4 201 15.642 crPackVertex3fvBBOX_COUNT >00085460 166 12.9183 crPackReleaseBuffer >0005bc04 148 11.5175 crPackTexCoord2fv >0003e528 134 10.428 crPackColor4fv >00085294 96 7.47082 crPackSetBuffer >00085558 87 6.77043 crPackResetPointers >00085820 53 4.12451 crPackAppendBuffer >00065134 49 3.81323 crPackWindowPos3fvARB >00085930 46 3.57977 crPackAppendBoundedBuffer >00084ef0 42 3.26848 crPackBoundsInfoCR >0008c00c 39 3.03502 crPackBitmap >00084fa0 35 2.72374 crPackAlloc >0008572c 33 2.56809 crPackCanHoldBuffer >00087f04 31 2.41245 crPackNullCurrentPointers >000857a8 24 1.8677 crPackCanHoldBoundedBuffer > > >Most programs that I've done profiling have pointed out that >crPackReleaseBuffer and crPackSetBuffer are the usual hot spots in the >packer. > > I'll take a look through them when I can. > >The tilesorter profile gives me fairly consistent profiles for most >programs that I've run: > > >Cpu type: PIII >Cpu speed was (MHz estimation) : 797.433 >Counter 0 counted CPU_CLK_UNHALTED events (clocks processor is not >halted) with a unit mask of 0x00 (Not set) count 600000 >vma samples %-age symbol name >000dde90 4737 43.3394 crStateCurrentRecover >0008afc0 1346 12.3147 crStateDiffContext >00080c1c 863 7.8957 crStatePixelStorei >00089784 705 6.45014 crStateCurrentDiff >00068220 525 4.80329 tilesortspuPinch >000542a4 467 4.27264 doFlush >0007f890 316 2.89113 crStateBufferObjectDiff >000470c4 295 2.69899 doBucket >0009f704 232 2.1226 crStateGetIntegerv >000cb5dc 209 1.91217 crStateTextureDiff >00048f80 194 1.77493 TransformBBox >00083a78 190 1.73833 crStateClientDiff >00055edc 128 1.17109 GetLimit >0006b25c 106 0.969808 tilesortspu_Bitmap >000ab5b0 75 0.686185 crStateBitmap > > > We make LOTS of calls to crStateCurrentRecover and that function is a HUGE stack of ifs. They look like they should become else ifs to me... But maybe not. Each of the ifs resets the value of "convert" and convert is not called until after a long line of ifs. >This is after building Cr with the gcc flags that you suggested (along >with -g for symbols). > >That means that the BIT ops are inlined as well as another patch that I >haven't released yet that inlines crPackCanHoldOpcode. > >Programs that make use of display lists give out an entirely different >story. 90% of the time is spent doing stuff like crSPUChangeInterface. > > We'll have to take a look at that function as well... >No matter what is going on with display lists, state tracking or no, >the majority of the time is spent switching around function pointers. >The only way that is going to go faster is to change the use of >SPUDispatchTable definitions inside the spus to pointers so that the >dispatch tables can be switched by pointer assignments. > > We should probably do this after we make sure the current tree is stable and major bugs get fixed first. >If anybody has any better ideas I'd like to know about them. > >I'd also would appreciate any pointers to any other >programs/applications that could help me out in finding any more hot >spots. > > > How about everyones favorite tiled display demo, Quake3? ;-) -Mike |