Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Rightclick on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
From: Tom Vercauteren <tom.vercauteren@gm...>  20080805 08:38:13

Hi all, I hope this is the correct place to discuss this issue. In ITK, some filters need a fast function to cast a double into an int with a mathematical floor operation. Right now a buggy code is used for that. I'd like to fix it and make it cleaner. I think that a good solution would be to use some vxl functions for that. I now that there already is a fast round function called vnl_math_rnd but it has some problems for halfway integers which have already been discussed on this mailing list. See for example: http://sourceforge.net/mailarchive/message.php?msg_id=47A1E3CF.9040407%40users.sourceforge.net Also vnl only provides round (no floor, no ceil). On the other hand, vcl_floor is too slow for some heavy duty work. A solution that makes sense to me can be found here: http://ldesoras.free.fr/doc/articles/rounding_en.pdf It provides fast round, floor and ceil with no issues at halway integers. Of course it has know limitations:  Only works within INT_MIN/2 – 1 and INT_MAX/2 + 1  Assumes that the FPU rounding mode has not been changed from the default one It basically uses the same assembly code than the one from vnl_math_rnd but takes care of the halfway case by using a multiplication by 2. Do you think it would be worth implementing such fast rounding functions? If so, how should we proceed? Best regards, Tom Vercauteren 
From: Peter Vanroose <peter_vanroose@ya...>  20080805 10:34:46

Tom Vercauteren wrote: > http://ldesoras.free.fr/doc/articles/rounding_en.pdf Interesting article! The best thing to do I believe is to insert at least the following 8 lines at line 181 of core/vnl/vnl_math.h : =============================================== // floor  round towards minus infinity inline int vnl_math_floor(float x) { return (x>=0.0)?(int)(x):(int)(x); } inline int vnl_math_floor(double x) { return (x>=0.0)?(int)(x):(int)(x); } // ceil  round towards plus infinity inline int vnl_math_ceil(float x) { return (x>=0.0)?(int)(x):(int)(x); } inline int vnl_math_ceil(double x) { return (x>=0.0)?(int)(x):(int)(x); } =============================================== and supplement these with the appropriate __asm which as you pointed out is very similar to the code for vnl_math_rnd. Also the "divide by 2" trick in that article is interesting  we should add it to the implementation of vnl_math_rnd: 8.50 should round to 9, not to 8, I would say.  Peter. __________________________________________________________ Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo. http://www.kelkoo.se/c100390123lanutansakerhet.html?partnerId=96915014 
From: Tom Vercauteren <tom.vercauteren@gm...>  20080805 11:37:33

> The best thing to do I believe is to insert at least the following 8 lines at line 181 of core/vnl/vnl_math.h : > > =============================================== > // floor  round towards minus infinity > inline int vnl_math_floor(float x) { return (x>=0.0)?(int)(x):(int)(x); } > inline int vnl_math_floor(double x) { return (x>=0.0)?(int)(x):(int)(x); } > > // ceil  round towards plus infinity > inline int vnl_math_ceil(float x) { return (x>=0.0)?(int)(x):(int)(x); } > inline int vnl_math_ceil(double x) { return (x>=0.0)?(int)(x):(int)(x); } > =============================================== > > and supplement these with the appropriate __asm which as you pointed out is very similar to the code for vnl_math_rnd. Seems fine to me! > Also the "divide by 2" trick in that article is interesting  we should add it to the implementation of vnl_math_rnd: 8.50 should round to 9, not to 8, I would say. I agree that the "divide by 2" trick is great. Actually, we use such a code on both gcc and msvc by default instead of the lround calls. It proved to be faster. It might however also be interesting to allow the user to use the full int range instead of restricting it to INT_MIN/2 to INT_MAX/2. On MSVC, the round functions we use are: inline int vnl_math_rnd(double x) { int r; x = 2*x+.5; __asm { fld x fistp r } return r>>1; } inline int vnl_math_rnd(float x) { int r; x = 2*x+.5f; __asm { fld x fistp r } return r>>1; } Similarly on gcc: inline int vnl_math_rnd(double x) { // assert(fegetround()==FE_TONEAREST); // assert (x > static_cast <double> (INT_MIN / 2) – 1.0); // assert (x < static_cast <double> (INT_MAX / 2) + 1.0); int r; x = 2*x+.5; __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); return r>>1; } inline int vnl_math_rnd(float x) { int r; x = 2*x+.5f; __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); return r>>1; } 
From: Peter Vanroose <peter_vanroose@ya...>  20080805 17:56:08

OK  added the straightforward nonoptimized inlined vnl_math_floor() and vnl_math_ceil() to vnl_math.h and added appropriate tests to vnl/tests/test_math.cxx . I won't attempt to implement the gcc and msvc optimized versions  that's for the specialists who have speed tests to compare and finetune the implementations.  Peter. __________________________________________________________ Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. Sök och jämför priser hos Kelkoo. http://www.kelkoo.se/c100015813bredband.html?partnerId=96914325 
From: Tom Vercauteren <tom.vercauteren@m4...>  20080806 07:49:41

> OK  added the straightforward nonoptimized inlined vnl_math_floor() and vnl_math_ceil() to vnl_math.h and added appropriate tests to vnl/tests/test_math.cxx . Great first step. Thanks! > I won't attempt to implement the gcc and msvc optimized versions  that's for the specialists who have speed tests to compare and finetune the implementations. For the specialists ease, I have put the code we use for the round, floor and ceil below. By the way, I realized that the current implementation of vnl_math_rnd on my linux machine was slower than the straightforward nonoptimized code. It seems that the following change was made with no timing tests: http://vxl.cvs.sourceforge.net/vxl/vxl/core/vnl/vnl_math.h?r1=1.39&r2=1.40&sortby=date On my machine, lrint is indeed (as mentionned in the comments) approximatively 2.5 times faster than the straightforward code but lround is approximatively 40% slower :( On the other hand the code below is (again on my linux machine) approximatively 4 times faster than the straightforward code. Regards, Tom On MSVC: inline int vnl_math_rnd(double x) { int r; x = 2*x+.5; __asm { fld x fistp r } return r>>1; } inline int vnl_math_rnd(float x) { int r; x = 2*x+.5f; __asm { fld x fistp r } return r>>1; } inline int vnl_math_floor(double x) { int r; x = 2*x.5; __asm { fld x fistp r } return r>>1; } inline int vnl_math_floor(float x) { int r; x = 2*x.5f; __asm { fld x fistp r } return r>>1; } inline int vnl_math_ceil(double x) { int r; x = .5  2*x; __asm { fld x fistp r } return (r>>1); } inline int vnl_math_ceil(float x) { int r; x = .5f  2*x; __asm { fld x fistp r } return (r>>1); } On gcc: inline int vnl_math_rnd(double x) { int r; x = 2*x+.5; __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); return r>>1; } inline int vnl_math_rnd(float x) { int r; x = 2*x+.5f; __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); return r>>1; } inline int vnl_math_floor(double x) { int r; x = 2*x.5; __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); return r>>1; } inline int vnl_math_floor(float x) { int r; x = 2*x.5f; __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); return r>>1; } inline int vnl_math_ceil(double x) { int r; x = .5  2*x; __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); return (r>>1); } inline int vnl_math_ceil(float x) { int r; x = .5f  2*x; __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); return (r>>1); } 
From: Brad King <brad.king@ki...>  20080806 14:44:34

Tom Vercauteren wrote: > By the way, I realized that the current implementation of vnl_math_rnd > on my linux machine was slower than the straightforward nonoptimized > code. It seems that the following change was made with no timing > tests: > http://vxl.cvs.sourceforge.net/vxl/vxl/core/vnl/vnl_math.h?r1=1.39&r2=1.40&sortby=date > On my machine, lrint is indeed (as mentionned in the comments) > approximatively 2.5 times faster than the straightforward code but > lround is approximatively 40% slower :( That change was made for purposes of correctness, not speed. Using lrint was actually producing wrong output if the FPU was configured to the wrong rounding mode. Whether lround is slower or faster than a handcoded implementation depends on the compiler's implementation of it. Brad 
From: Amitha Perera <amithaperera@us...>  20080806 16:06:10

I think we should define the semantics of round, ceil, floor, etc. without reference to FPU flags. These semantics should be encoded in the test cases. (I think Peter has already done a lot of this.) The implementation needs to match these semantics, no matter the cost. If these semantics don't match the current (last week's) implementation, we should think about whether we should be introducing new function names for backward compatibility. If we don't, we run the risk of previously working programs suddenly not working. If we implement the assembly versions, then we run the risk of the FPU flags not being the in correct mode. From what I understand, it is extremely unlikely for the flags to be incorrect for the places where vnl would be used, so the set+restore option is too expensive. Is there an assert that we can place to verify that the FPU flags are what we expect? That way, it won't be a silent failure, at least on a debug build. I think vnl is probably the most widely used part of vxl, and so it is worthwhile to be cautious about fundamental changes in semantics. Amitha. 
From: Tom Vercauteren <tom.vercauteren@m4...>  20081001 14:07:12

Hi all, A couple of months ago, we discussed the possibility of adding some fast implementations of floor, ceil and round to vnl. Peter Vanroose introduced the API, reference implementations and some test cases for these functions. I was wondering whether any of the vnl developers would be interested in looking at that question once again and see if it would be feasible to make the next step: introducing the fast, platformspecific, implementations of these functions. This would be great for many ITK filters! Best regards, Tom Vercauteren On Wed, Aug 6, 2008 at 9:49 AM, Tom Vercauteren <tom.vercauteren@...> wrote: > For the specialists ease, I have put the code we use for the round, > floor and ceil below. > > Regards, > > Tom > > > > > On MSVC: > > > inline int vnl_math_rnd(double x) > { > int r; > x = 2*x+.5; > __asm { > fld x > fistp r > } > > return r>>1; > } > > inline int vnl_math_rnd(float x) > { > int r; > x = 2*x+.5f; > __asm { > fld x > fistp r > } > > return r>>1; > } > > inline int vnl_math_floor(double x) > { > int r; > x = 2*x.5; > __asm { > fld x > fistp r > } > > return r>>1; > } > > inline int vnl_math_floor(float x) > { > int r; > x = 2*x.5f; > __asm { > fld x > fistp r > } > > return r>>1; > } > > inline int vnl_math_ceil(double x) > { > int r; > x = .5  2*x; > __asm { > fld x > fistp r > } > > return (r>>1); > } > > inline int vnl_math_ceil(float x) > { > int r; > x = .5f  2*x; > __asm { > fld x > fistp r > } > > return (r>>1); > } > > > > > > > On gcc: > > > inline int vnl_math_rnd(double x) > { > int r; > x = 2*x+.5; > __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); > > return r>>1; > } > > inline int vnl_math_rnd(float x) > { > int r; > x = 2*x+.5f; > __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); > > return r>>1; > } > > inline int vnl_math_floor(double x) > { > int r; > x = 2*x.5; > __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); > > return r>>1; > } > > inline int vnl_math_floor(float x) > { > int r; > x = 2*x.5f; > __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); > > return r>>1; > } > > inline int vnl_math_ceil(double x) > { > int r; > x = .5  2*x; > __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); > > return (r>>1); > } > > inline int vnl_math_ceil(float x) > { > int r; > x = .5f  2*x; > __asm__ __volatile__ ("fistpl %0" : "=m"(r) : "t"(x) : "st"); > > return (r>>1); > } > 
From: Tom Vercauteren <tom.vercauteren@m4...>  20080806 16:26:18

Hi Amitha, > I think we should define the semantics of round, ceil, floor, etc. without > reference to FPU flags. These semantics should be encoded in the test > cases. (I think Peter has already done a lot of this.) The implementation > needs to match these semantics, no matter the cost. > > If these semantics don't match the current (last week's) implementation, we > should think about whether we should be introducing new function names for > backward compatibility. If we don't, we run the risk of previously working > programs suddenly not working. I fully agree with that. > If we implement the assembly versions, then we run the risk of the FPU flags > not being the in correct mode. From what I understand, it is extremely > unlikely for the flags to be incorrect for the places where vnl would be > used, so the set+restore option is too expensive. Is there an assert that > we can place to verify that the FPU flags are what we expect? That way, it > won't be a silent failure, at least on a debug build. Having a nondefault FPU mode indeed seems unlikely but the documentation should make it clear that the rounding functions won't work properly if the user changes it... Anyhow, allowing the use of an assert is what we propose. We actually use the following piece of code: #ifdef CHECK_FPU_ROUNDING_MODE assert(fegetround()==FE_TONEAREST); #endif > I think vnl is probably the most widely used part of vxl, and so it is > worthwhile to be cautious about fundamental changes in semantics. Definitely seems reasonable. Tom 
From: Ian Scott <ian.scott@st...>  20080805 11:57:48

This problem has been on my todo list for over a year now, and a halfdecent fast floor operation would be really welcome. I've been using http://www.stereopsis.com/FPU.html and its references as a guide. I have been trying (without) much success to write some code at the CMake config stage to test several different fast floor functions and use whichever one works and is fastest. However I would settle for something that works on a few platforms and degrades gracefully on the rest. The main problem with ASM code is getting that to work on 64 bit platforms. This is a really important speedup for randomaccess subsampling of images, and if you can get it to work, I'd be happy to port it into vil (VXL's imaging library) and convert the sampling code there to use it. For image sampling code, any restriction to [INT_MIN/2, INT_MAX/2] is not a problem. Regards, Ian. Tom Vercauteren wrote: > Hi all, > > I hope this is the correct place to discuss this issue. > > In ITK, some filters need a fast function to cast a double into an int > with a mathematical floor operation. Right now a buggy code is used > for that. I'd like to fix it and make it cleaner. > > I think that a good solution would be to use some vxl functions for that. > > I now that there already is a fast round function called vnl_math_rnd > but it has some problems for halfway integers which have already been > discussed on this mailing list. See for example: > http://sourceforge.net/mailarchive/message.php?msg_id=47A1E3CF.9040407%40users.sourceforge.net > > Also vnl only provides round (no floor, no ceil). On the other hand, > vcl_floor is too slow for some heavy duty work. > > A solution that makes sense to me can be found here: > http://ldesoras.free.fr/doc/articles/rounding_en.pdf > > It provides fast round, floor and ceil with no issues at halway > integers. Of course it has know limitations: >  Only works within INT_MIN/2 – 1 and INT_MAX/2 + 1 >  Assumes that the FPU rounding mode has not been changed from the default one > > It basically uses the same assembly code than the one from > vnl_math_rnd but takes care of the halfway case by using a > multiplication by 2. > > Do you think it would be worth implementing such fast rounding > functions? If so, how should we proceed? > > Best regards, > Tom Vercauteren > >  > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblincontest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Vxlmaintainers mailing list > Vxlmaintainers@... > https://lists.sourceforge.net/lists/listinfo/vxlmaintainers 
From: Tom Vercauteren <tom.vercauteren@m4...>  20080805 12:42:53

> I've been using http://www.stereopsis.com/FPU.html and its references as a > guide. I have been trying (without) much success to write some code at the > CMake config stage to test several different fast floor functions and use > whichever one works and is fastest. That would be a great thing. Thanks for the interesting link. By the way, the "Sree's Real2Int" solution is very likely to have some problems. It indeed looks very close to the solution used in some ITK filters which have problems with numbers that are very close to integer values. Indeed the following test fails: inline int BSplineFloor(double x) { #if defined mips  defined sparc  defined __ppc__ return (int)((unsigned int)(x + 2147483648.0)  2147483648U); #elif defined i386  defined _M_IX86 union { unsigned int hilo[2]; double d; } u; u.d = x + 103079215104.0; return (int)((u.hilo[1]<<16)(u.hilo[0]>>16)); #else return int(floor(x)); #endif } int main () { assert(BSplineFloor(0.999999)==0); } > However I would settle for something that works on a few platforms and > degrades gracefully on the rest. The main problem with ASM code is getting > that to work on 64 bit platforms. Well the ASM code works on 64 bit linux. I don't know about windows since I don't have such a machine avaliable. By the way, I am not convinced about the following comment in vn_math.h: // NB But Win64 does not support the nonstandard _asm It is maybe only necessary to use __asm (note the two underscores) instead of the deprecated _asm: http://msdn.microsoft.com/enus/library/45yd4tzz(VS.80).aspx > This is a really important speedup for randomaccess subsampling of images, > and if you can get it to work, I'd be happy to port it into vil (VXL's > imaging library) and convert the sampling code there to use it. The solution with the fistp and the divide by two trick works on all the machines we have (linux 32 and 64, mac intel and windows 32). > For image sampling code, any restriction to > [INT_MIN/2, INT_MAX/2] is not a problem. Same thing for us: [INT_MIN/2, INT_MAX/2] is not a problem. Regards, Tom 
Sign up for the SourceForge newsletter:
No, thanks