Re: [Mesa3d-dev] [RFC] Move _mesa_memcpy to imports.h and inline it

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Roland Scheidegger wrote:
> On 03.12.2009 19:46, Matt Turner wrote:
>> Most of the functions in imports.c are very small, to the function
>> call overhead is large relative to their size. Can't we do something
>> like in the attached patch and move them to imports.h and mark them
>> static inline? Things like memcpy and memset are often optimized out
>> by the compiler, and by hiding them in these wrapper functions, we're
>> probably losing any benefits we'd otherwise see.
> ++ from me, at least for the very simple wrappers. _mesa_memcpy
> especially I think can be very nicely used for array assignments and the
> like, and in case of (very) small amounts of data to copy call overhead
> might be significant.
> 
>> Similarly, if we're
>> going to use a magic sqrtf algorithm (apparently for speed) then
>> shouldn't we also let the compiler properly inline the function?
> Not sure here, the function is still quite complex, I don't think call
> overhead will make any difference. I've looked at the code though when
> it wasn't using the fast path (with -O3 but DEBUG - why is this different?)
> This version though adds a lot of overhead:
> - call overhead for _mesa_sqrtf
> - overhead converting to double
> - overhead converting back
> In the generated code the actual sqrtf code was a single assembly
> instruction (sqrtsd %xmm0, %xmm0) - granted that's SSE2 only, and it
> requires quite a few cycles. Still, I guess the overhead is significant,
> not to mention that if we'd just use a float instead of double not only
> we wouldn't have to convert the type but the compiler would actually
> issue sqrtss %xmm0 %xmm0 instead, which is (depending on the cpu) twice
> as fast. Not sure why we use double there, are there platforms where
> sqrtf (float x) isn't supported?
> So really, call overhead is a tiny fraction of the optimization
> potential for this function. When not using DEBUG (and USE_IEEE is
> defined) the function is still quite a few cycles, so call overhead
> doesn't look that bad neither. I don't actually know which version is
> faster (or more accurate - I think though sqrtss is actually fully
> accurate). Of course using sqrtf(x) will only be fast if the cpu
> supports some kind of fast float unit (and the compiler knows how to use
> it).
> If you'd want to do some more optimization, there's for instance
> _mesa_inv_sqrtf - it is supposedly fast, but sse2 offers rsqrtss, which
> is really fast. However, I remember we got some bugs some time ago when
> gcc actually used that, because precision wasn't enough - it will do
> this if you enable -funsafe-math-optimizations, -mrecip or similar. I've
> just seen though actually that at least gcc 4.4 does an additional
> newton-raphson step when you do 1.0/sqrtf(float x) (so it will issue
> rsqrtss plus a couple muls and adds), which might still be less or even
> more accurate, and almost certainly be faster than the manual version.
> So there's probably far more optimization potential than the call
> overhead. Most of those functions are probably never used in any
> performance critical path anyway.
> 
> 
>> I also don't quite understand wrapper functions like
>> double
>> _mesa_pow(double x, double y)
>> {
>>    return pow(x, y);
>> }
>>
>> Maybe at one time these had #ifdefs in them like _mesa_memcpy, but I
>> can't see any reason not to remove it now.
>>
>> Someone enlighten me.
> I guess there might have been indeed #ifdefs in the past. In any case,
> using wrapper would make it easier to implement such optimizations in
> the future if anyone wants to, not that this is something which you
> probably want to do (that stuff is probably better left up to the
> compiler). So, at least if they are inlined, they shouldn't really hurt
> neither.

I've been meaning to go over imports.[ch] and make a bunch of the 
wrapper functions inlines.

A lot of the wrappers aren't needed any more.  Back before valgrind I 
used the memory-related wrappers quite often.  For now, let's keep the 
wrappers so we don't have to touch tons of other files right away.

Matt, feel free to submit a patch.

-Brian