Re: [PanoTools-devel] gcc AMD64 speedup patch

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

mea culpa,
i did some reading in the Internet and found
http://www.nsc.liu.se/~boein/ifip/kyoto/enable-procs.txt
realy interesting.
in short:
optimised means catching cases like internal overflow (x^2)
if there is no danger, it falls back to sqrt(x^2+y^2).

ntl: i would go for hypot() if i can not optimise it away.

just my 5 cents
	walter

Pablo d'Angelo wrote:
> walter harms schrieb:
>>hi pablo,
>>you may like to replace sqrt(x^2+y^2) with hypot(x,y). systems have
>>often a optimised version of it.
> 
> Do know a system where hypot is acutally faster than sqrt(x^2+y^2)?
> 
> I have just looked at the implementation of the x86_64 mathlib (as
> shipped by suse libc), and I'm sure that hypot is slower there.
> 
> sqrt(x*x+y*y):
> 
> # math.c:616
> 	.loc 1 616 0
> 	mulsd	%xmm0, %xmm2
> .LVL237:
> 	mulsd	%xmm1, %xmm1
> 	addsd	%xmm0, %xmm1
> 	sqrtsd	%xmm1, %xmm0
> 
> 
> hypot(x,y):
> 
> seems to be slower since it is mainly concerned with the case where
> r=x^2+y^2 becomes bigger than the largest number a double can handle.
> 
> The fast, inexcat mode of hypot leads to
> r = x*x + y*y
> sqrt(r)
> 
> Since there are a lot of if's for the exponent handling in between and
> its not inlined it will probably be slower...
> 
> ciao
>   Pablo
> 
> 

Re: [PanoTools-devel] gcc AMD64 speedup patch

The cross-platform library behind the Hugin photo stitcher

Re: [PanoTools-devel] gcc AMD64 speedup patch