From: Pablo d'A. <pab...@we...> - 2005-11-07 01:06:53
Attachments:
amd64-tan-speedup.patch
|
Hi all, I have noticed that stitching 360x180 degs panos using hugin is very slow on my linux (ubuntu breezy) AMD64 machine. I have traced the problem to the tan() function standart math library shipped with glibc 2.3.5. This function is terribly slow for tan(pi/2). For this case it is roughly 40000 times slower than the i386 equivalent !! Unfortunately tan(pi/2) is frequently computed in rect_sphere_tp. This patch detects this case and uses a fixed big number for tan(pi/2). Some linux distributions (suse, redhat) ship with a different, optimized math library, but I haven't tested this one yet. This patch should definately be used for all gcc compiled libpano12's on amd64. ciao Pablo |
From: Pablo d'A. <pab...@we...> - 2005-11-07 23:13:17
|
walter harms schrieb: > > hi pablo, > you may like to replace sqrt(x^2+y^2) with hypot(x,y). systems have > often a optimised version of it. Do know a system where hypot is acutally faster than sqrt(x^2+y^2)? I have just looked at the implementation of the x86_64 mathlib (as shipped by suse libc), and I'm sure that hypot is slower there. sqrt(x*x+y*y): # math.c:616 .loc 1 616 0 mulsd %xmm0, %xmm2 .LVL237: mulsd %xmm1, %xmm1 addsd %xmm0, %xmm1 sqrtsd %xmm1, %xmm0 hypot(x,y): seems to be slower since it is mainly concerned with the case where r=x^2+y^2 becomes bigger than the largest number a double can handle. The fast, inexcat mode of hypot leads to r = x*x + y*y sqrt(r) Since there are a lot of if's for the exponent handling in between and its not inlined it will probably be slower... ciao Pablo |
From: walter h. <wh...@bf...> - 2005-11-08 12:44:49
|
mea culpa, i did some reading in the Internet and found http://www.nsc.liu.se/~boein/ifip/kyoto/enable-procs.txt realy interesting. in short: optimised means catching cases like internal overflow (x^2) if there is no danger, it falls back to sqrt(x^2+y^2). ntl: i would go for hypot() if i can not optimise it away. just my 5 cents walter Pablo d'Angelo wrote: > walter harms schrieb: >>hi pablo, >>you may like to replace sqrt(x^2+y^2) with hypot(x,y). systems have >>often a optimised version of it. > > Do know a system where hypot is acutally faster than sqrt(x^2+y^2)? > > I have just looked at the implementation of the x86_64 mathlib (as > shipped by suse libc), and I'm sure that hypot is slower there. > > sqrt(x*x+y*y): > > # math.c:616 > .loc 1 616 0 > mulsd %xmm0, %xmm2 > .LVL237: > mulsd %xmm1, %xmm1 > addsd %xmm0, %xmm1 > sqrtsd %xmm1, %xmm0 > > > hypot(x,y): > > seems to be slower since it is mainly concerned with the case where > r=x^2+y^2 becomes bigger than the largest number a double can handle. > > The fast, inexcat mode of hypot leads to > r = x*x + y*y > sqrt(r) > > Since there are a lot of if's for the exponent handling in between and > its not inlined it will probably be slower... > > ciao > Pablo > > |