From: Keith W. <kei...@ya...> - 2004-07-27 12:56:12
|
Roland Scheidegger wrote: > Roland Scheidegger wrote: > >> Hello >> >> I've just noticed that ut2k3 is two times (!) as fast when using the >> SuSE 9.1 supplied XFree 4.3.902 than when using latest Mesa cvs >> (Radeon 9000Pro, Athlon XP 1600 with 1GB sdram). Some numbers >> (antalus-flyby benchmark with stock settings): slowest avg fastest >> XFree 4.3.902 15.6 / 36.3 / 211.1 Mesa CVS 7.3 / 21.8 / 147.5 >> >> After some profiling (results attached) it looked obvious the changed >> _tnl_DrawRangeElements function is responsible for the lower >> performance. >> >> After some analysis (printf is always a helpful debug tool...) the >> reason that fallback_drawelements is called way more often than >> before seems to be that in _tnl_DrawRangeElements the function was >> changed so now the optimized path only works if the range starts at 0 >> (I didn't look at the locked path, ut2k3 always uses non-locked >> arrays), specifically this code else if (end + 1 - start < >> ctx->Const.MaxArrayLockSize) { /* The arrays aren't locked but we can >> still fit them inside a * single vertexbuffer. */ >> _tnl_draw_range_elements( ctx, mode, start, end + 1, count, ui_indices ); >> >> was changed to else if (start == 0 && end < >> ctx->Const.MaxArrayLockSize) { /* The arrays aren't locked but we can >> still fit them inside a * single vertexbuffer. */ >> _tnl_draw_range_elements( ctx, mode, end + 1, count, type, indices ); >> >> >> In fact, the new _tnl_draw_range_elements function has a comment: /* >> Note this function no longer takes a 'start' value, the range is * >> assumed to start at zero. The old trick of subtracting 'start' * >> from each index won't work if the indices are not in writeable * >> memory. */ Is this related to ARB_vbo, or how can the indices not be >> in writeable memory? > > Ian pointed this out, this happens when the user submits unsigned > integer elements which will not be converted (well conversion is bad > idea anyway but it works around the problem). However, some hacking > around shows that the initial assumption that it's this change which > causes the huge speed loss seems to be flawed. I've hacked in more or > less the previous behaviour (by re-introducing the start value and > extending the render template stuff to do the index hack itself) and it > does not help performance. > I'm not sure, but could this be due to the removal of some (not always > correct) optimization in the tnl path when the vtx-0-2 branch was > merged? I'm talking about the _tnl_translate_array_elts stuff in > fuctions like _tnl_Begin, which had comments such like this: > /* Not quite right. Need to use the fallback '_aa_ArrayElement' > * when not known to be inside begin/end and arrays are > * unlocked. > */ > This code seems to have completely vanished in current Mesa, or maybe it > got replaced with something else, but I can't see it. > Keith, could you comment on that? I think you probably can make much > more sense out of the profiling data I posted than I can... > Normally I wouldn't made such a fuss just because of some performance > problem, but I think spending 6 times as much cpu time in the tnl path > in ut2k3 (which in turn results on my pc in an overall performance > degradation of 40%) is a major problem (it easily gets ut2k3 performance > from "playable, but just barely" to "completely unplayable" for me - of > course, this game really wants something like arb_vbo). The profiling data shows that DrawElements is proceeding via the fallback path (ie _ae_loopback_array_elements()) - Ian's explanation for that was pretty reasonable. The comment above is iirc to do with situations where the application emits multiple 'ArrayElement()' calls rather than calling DrawElements(). It may be that previously the fallback path managed to jump onto this not-quite-correct but not-so-bad path, but the real solution is to keep from ending up translating the DrawElements() calls into individual ArrayElement() calls at all. If Ian's explanation is wrong, you need to find a better one, but the problem is that _ae_loopback_array_elements() is ever hit at all. Keith |