From: Brian P. <br...@vm...> - 2009-12-03 20:47:47
|
Roland Scheidegger wrote: > On 03.12.2009 19:46, Matt Turner wrote: >> Most of the functions in imports.c are very small, to the function >> call overhead is large relative to their size. Can't we do something >> like in the attached patch and move them to imports.h and mark them >> static inline? Things like memcpy and memset are often optimized out >> by the compiler, and by hiding them in these wrapper functions, we're >> probably losing any benefits we'd otherwise see. > ++ from me, at least for the very simple wrappers. _mesa_memcpy > especially I think can be very nicely used for array assignments and the > like, and in case of (very) small amounts of data to copy call overhead > might be significant. > >> Similarly, if we're >> going to use a magic sqrtf algorithm (apparently for speed) then >> shouldn't we also let the compiler properly inline the function? > Not sure here, the function is still quite complex, I don't think call > overhead will make any difference. I've looked at the code though when > it wasn't using the fast path (with -O3 but DEBUG - why is this different?) > This version though adds a lot of overhead: > - call overhead for _mesa_sqrtf > - overhead converting to double > - overhead converting back > In the generated code the actual sqrtf code was a single assembly > instruction (sqrtsd %xmm0, %xmm0) - granted that's SSE2 only, and it > requires quite a few cycles. Still, I guess the overhead is significant, > not to mention that if we'd just use a float instead of double not only > we wouldn't have to convert the type but the compiler would actually > issue sqrtss %xmm0 %xmm0 instead, which is (depending on the cpu) twice > as fast. Not sure why we use double there, are there platforms where > sqrtf (float x) isn't supported? > So really, call overhead is a tiny fraction of the optimization > potential for this function. When not using DEBUG (and USE_IEEE is > defined) the function is still quite a few cycles, so call overhead > doesn't look that bad neither. I don't actually know which version is > faster (or more accurate - I think though sqrtss is actually fully > accurate). Of course using sqrtf(x) will only be fast if the cpu > supports some kind of fast float unit (and the compiler knows how to use > it). > If you'd want to do some more optimization, there's for instance > _mesa_inv_sqrtf - it is supposedly fast, but sse2 offers rsqrtss, which > is really fast. However, I remember we got some bugs some time ago when > gcc actually used that, because precision wasn't enough - it will do > this if you enable -funsafe-math-optimizations, -mrecip or similar. I've > just seen though actually that at least gcc 4.4 does an additional > newton-raphson step when you do 1.0/sqrtf(float x) (so it will issue > rsqrtss plus a couple muls and adds), which might still be less or even > more accurate, and almost certainly be faster than the manual version. > So there's probably far more optimization potential than the call > overhead. Most of those functions are probably never used in any > performance critical path anyway. > > >> I also don't quite understand wrapper functions like >> double >> _mesa_pow(double x, double y) >> { >> return pow(x, y); >> } >> >> Maybe at one time these had #ifdefs in them like _mesa_memcpy, but I >> can't see any reason not to remove it now. >> >> Someone enlighten me. > I guess there might have been indeed #ifdefs in the past. In any case, > using wrapper would make it easier to implement such optimizations in > the future if anyone wants to, not that this is something which you > probably want to do (that stuff is probably better left up to the > compiler). So, at least if they are inlined, they shouldn't really hurt > neither. I've been meaning to go over imports.[ch] and make a bunch of the wrapper functions inlines. A lot of the wrappers aren't needed any more. Back before valgrind I used the memory-related wrappers quite often. For now, let's keep the wrappers so we don't have to touch tons of other files right away. Matt, feel free to submit a patch. -Brian |