From: Tom Vercauteren <tom.vercauteren@gm...>  20090112 17:25:26
Attachments:
vnl_math_roundround_up.patch

Dear all, Some time ago, I helped implementing a few optimized real to integer rounding functions in vnl_math.h. In the process of trying to update ITK to use the new implementation of vnl_math_rnd, we had some interesting discussion on the ITK developers mailing list. See e.g. http://www.itk.org/mailman/private/insightdevelopers/2009January/011510.html What appeared from these discussions is that many ITK developpers find it disturbing that the current implementation of vnl_math_rnd does not behave consistently across platforms. The problem stems from halfway cases that can be either rounded away from zero or rounded to the closest even integer according to the hardware. Note this behavior also existed before the optimized rounding I helped implementing. Actually there is already a workaround in ITK: http://www.itk.org/cgibin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55 Attached is a patch that fixes this issue by making vnl_math_rnd round half integers upwards on all platforms. The performance loss is in my opinion acceptable: Time for vanilla rnd with halfint round away zero: 643 Time for vanilla rnd with halfint round up: 832 Time for lrint: 321 Time for sse2 rnd with halfint round to nearest even: 156 Time for sse2 rnd with halfint round up: 201 Time for asm rnd with halfint round to nearest even: 175 Time for asm rnd with halfint round up: 221 The other (small) price to pay with this patch is that the optimized implementations of vnl_math_rnd would only work for numbers whose absolute value is less that INT_MAX / 2 (same as vnl_math_floor and vnl_math_ceil). The patch also features some small code cleanup and make the test for vnl_math_rnd more stringent. Let me know if you find this patch reasonable. Best regards, Tom Vercauteren 
From: Ian Scott <ian.scott@st...>  20090112 17:42:42

Tom, Thankyou for looking into this issue. Wouldn't it be better to choose the fast option on the most common highperformance platform (x8664) as the default rounding operation? If I understand your results, is round to even the fastest? Ian. Tom Vercauteren wrote: > Dear all, > > Some time ago, I helped implementing a few optimized real to integer > rounding functions in vnl_math.h. > > In the process of trying to update ITK to use the new implementation > of vnl_math_rnd, we had some interesting discussion on the ITK > developers mailing list. See e.g. > http://www.itk.org/mailman/private/insightdevelopers/2009January/011510.html > > What appeared from these discussions is that many ITK developpers find > it disturbing that the current implementation of vnl_math_rnd does not > behave consistently across platforms. The problem stems from halfway > cases that can be either rounded away from zero or rounded to the > closest even integer according to the hardware. Note this behavior > also existed before the optimized rounding I helped implementing. > Actually there is already a workaround in ITK: > http://www.itk.org/cgibin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55 > > Attached is a patch that fixes this issue by making vnl_math_rnd round > half integers upwards on all platforms. The performance loss is in my > opinion acceptable: > Time for vanilla rnd with halfint round away zero: 643 > Time for vanilla rnd with halfint round up: 832 > Time for lrint: 321 > Time for sse2 rnd with halfint round to nearest even: 156 > Time for sse2 rnd with halfint round up: 201 > Time for asm rnd with halfint round to nearest even: 175 > Time for asm rnd with halfint round up: 221 > > The other (small) price to pay with this patch is that the optimized > implementations of vnl_math_rnd would only work for numbers whose > absolute value is less that INT_MAX / 2 (same as vnl_math_floor and > vnl_math_ceil). > > The patch also features some small code cleanup and make the test for > vnl_math_rnd more stringent. > > Let me know if you find this patch reasonable. > > Best regards, > Tom Vercauteren > > 
From: Tom Vercauteren <tom.vercauteren@m4...>  20090112 18:00:31

Hi Ian, Thanks for the feedback. I am not sure to fully understand you though. I should have said that the results I showed are only valid on my machine (32 bits linux using gcc). But indeed the fastest results are for "round to even" which is the default rounding mode. In order to get something like "round up", I used a multiply and divide by two trick which slows down the operation. On x8664 a fast implementation will be used if either sse2 is turned on or gcc is used. Windows 64 bits with MSVC and without sse2 turned on cannot benefit from the optimized asm function because Win64 doesn't support inline assembler. Hope this information helps. Tom On Mon, Jan 12, 2009 at 6:42 PM, Ian Scott <ian.m.scott@...> wrote: > Tom, > > Thankyou for looking into this issue. > > Wouldn't it be better to choose the fast option on the most common > highperformance platform (x8664) as the default rounding operation? > > If I understand your results, is round to even the fastest? > > Ian. > > > > Tom Vercauteren wrote: >> >> Dear all, >> >> Some time ago, I helped implementing a few optimized real to integer >> rounding functions in vnl_math.h. >> >> In the process of trying to update ITK to use the new implementation >> of vnl_math_rnd, we had some interesting discussion on the ITK >> developers mailing list. See e.g. >> >> http://www.itk.org/mailman/private/insightdevelopers/2009January/011510.html >> >> What appeared from these discussions is that many ITK developpers find >> it disturbing that the current implementation of vnl_math_rnd does not >> behave consistently across platforms. The problem stems from halfway >> cases that can be either rounded away from zero or rounded to the >> closest even integer according to the hardware. Note this behavior >> also existed before the optimized rounding I helped implementing. >> Actually there is already a workaround in ITK: >> >> http://www.itk.org/cgibin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55 >> >> Attached is a patch that fixes this issue by making vnl_math_rnd round >> half integers upwards on all platforms. The performance loss is in my >> opinion acceptable: >> Time for vanilla rnd with halfint round away zero: 643 >> Time for vanilla rnd with halfint round up: 832 >> Time for lrint: >> 321 >> Time for sse2 rnd with halfint round to nearest even: 156 >> Time for sse2 rnd with halfint round up: 201 >> Time for asm rnd with halfint round to nearest even: 175 >> Time for asm rnd with halfint round up: 221 >> >> The other (small) price to pay with this patch is that the optimized >> implementations of vnl_math_rnd would only work for numbers whose >> absolute value is less that INT_MAX / 2 (same as vnl_math_floor and >> vnl_math_ceil). >> >> The patch also features some small code cleanup and make the test for >> vnl_math_rnd more stringent. >> >> Let me know if you find this patch reasonable. >> >> Best regards, >> Tom Vercauteren >> >> > > 
From: Chuck Atkins <chuck.atkins@ki...>  20090112 20:06:40
Attachments:
Message as HTML

With MSVC and Win64 it's actually a bit more complicated than that. While the compiler doesn't support inline assembler for Win64, the OS actually has a problem with the x87 FPU. So even if it did support inline assembler, the inline instructions would still have to use SSE(1, 2, etc.) instructions instead x87 FPU instructions. If you look ad the actual instructions generated by any win64 compiler, not just msvc, you should notice that all floating point operations are done via SSE. This is not so much as an optimizations as it is out of necessity. On Mon, Jan 12, 2009 at 1:00 PM, Tom Vercauteren <tom.vercauteren@...>wrote: > Hi Ian, > > Thanks for the feedback. I am not sure to fully understand you though. > > I should have said that the results I showed are only valid on my > machine (32 bits linux using gcc). But indeed the fastest results are > for "round to even" which is the default rounding mode. In order to > get something like "round up", I used a multiply and divide by two > trick which slows down the operation. > > On x8664 a fast implementation will be used if either sse2 is turned > on or gcc is used. Windows 64 bits with MSVC and without sse2 turned > on cannot benefit from the optimized asm function because Win64 > doesn't support inline assembler. > > Hope this information helps. > > Tom > > On Mon, Jan 12, 2009 at 6:42 PM, Ian Scott > <ian.m.scott@...> wrote: > > Tom, > > > > Thankyou for looking into this issue. > > > > Wouldn't it be better to choose the fast option on the most common > > highperformance platform (x8664) as the default rounding operation? > > > > If I understand your results, is round to even the fastest? > > > > Ian. > > > > > > > > Tom Vercauteren wrote: > >> > >> Dear all, > >> > >> Some time ago, I helped implementing a few optimized real to integer > >> rounding functions in vnl_math.h. > >> > >> In the process of trying to update ITK to use the new implementation > >> of vnl_math_rnd, we had some interesting discussion on the ITK > >> developers mailing list. See e.g. > >> > >> > http://www.itk.org/mailman/private/insightdevelopers/2009January/011510.html > >> > >> What appeared from these discussions is that many ITK developpers find > >> it disturbing that the current implementation of vnl_math_rnd does not > >> behave consistently across platforms. The problem stems from halfway > >> cases that can be either rounded away from zero or rounded to the > >> closest even integer according to the hardware. Note this behavior > >> also existed before the optimized rounding I helped implementing. > >> Actually there is already a workaround in ITK: > >> > >> > http://www.itk.org/cgibin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55 > >> > >> Attached is a patch that fixes this issue by making vnl_math_rnd round > >> half integers upwards on all platforms. The performance loss is in my > >> opinion acceptable: > >> Time for vanilla rnd with halfint round away zero: 643 > >> Time for vanilla rnd with halfint round up: 832 > >> Time for lrint: > >> 321 > >> Time for sse2 rnd with halfint round to nearest even: 156 > >> Time for sse2 rnd with halfint round up: 201 > >> Time for asm rnd with halfint round to nearest even: 175 > >> Time for asm rnd with halfint round up: 221 > >> > >> The other (small) price to pay with this patch is that the optimized > >> implementations of vnl_math_rnd would only work for numbers whose > >> absolute value is less that INT_MAX / 2 (same as vnl_math_floor and > >> vnl_math_ceil). > >> > >> The patch also features some small code cleanup and make the test for > >> vnl_math_rnd more stringent. > >> > >> Let me know if you find this patch reasonable. > >> > >> Best regards, > >> Tom Vercauteren > >> > >> > > > > > > >  > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sfspreadtheword > _______________________________________________ > Vxlmaintainers mailing list > Vxlmaintainers@... > https://lists.sourceforge.net/lists/listinfo/vxlmaintainers > 
From: Chuck Atkins <chuck.atkins@ki...>  20090112 20:02:55
Attachments:
Message as HTML

In fact, with msvc and win64, the options to enable "Enhanced Instruction Sets" (/arch:SSE or /arch:SSE2) are not even valid options since they are necessary reguardless. On Mon, Jan 12, 2009 at 2:35 PM, Chuck Atkins <chuck.atkins@...>wrote: > With MSVC and Win64 it's actually a bit more complicated than that. While > the compiler doesn't support inline assembler for Win64, the OS actually has > a problem with the x87 FPU. So even if it did support inline assembler, the > inline instructions would still have to use SSE(1, 2, etc.) instructions > instead x87 FPU instructions. If you look ad the actual instructions > generated by any win64 compiler, not just msvc, you should notice that all > floating point operations are done via SSE. This is not so much as an > optimizations as it is out of necessity. > > On Mon, Jan 12, 2009 at 1:00 PM, Tom Vercauteren <tom.vercauteren@...>wrote: > >> Hi Ian, >> >> Thanks for the feedback. I am not sure to fully understand you though. >> >> I should have said that the results I showed are only valid on my >> machine (32 bits linux using gcc). But indeed the fastest results are >> for "round to even" which is the default rounding mode. In order to >> get something like "round up", I used a multiply and divide by two >> trick which slows down the operation. >> >> On x8664 a fast implementation will be used if either sse2 is turned >> on or gcc is used. Windows 64 bits with MSVC and without sse2 turned >> on cannot benefit from the optimized asm function because Win64 >> doesn't support inline assembler. >> >> Hope this information helps. >> >> Tom >> >> On Mon, Jan 12, 2009 at 6:42 PM, Ian Scott >> <ian.m.scott@...> wrote: >> > Tom, >> > >> > Thankyou for looking into this issue. >> > >> > Wouldn't it be better to choose the fast option on the most common >> > highperformance platform (x8664) as the default rounding operation? >> > >> > If I understand your results, is round to even the fastest? >> > >> > Ian. >> > >> > >> > >> > Tom Vercauteren wrote: >> >> >> >> Dear all, >> >> >> >> Some time ago, I helped implementing a few optimized real to integer >> >> rounding functions in vnl_math.h. >> >> >> >> In the process of trying to update ITK to use the new implementation >> >> of vnl_math_rnd, we had some interesting discussion on the ITK >> >> developers mailing list. See e.g. >> >> >> >> >> http://www.itk.org/mailman/private/insightdevelopers/2009January/011510.html >> >> >> >> What appeared from these discussions is that many ITK developpers find >> >> it disturbing that the current implementation of vnl_math_rnd does not >> >> behave consistently across platforms. The problem stems from halfway >> >> cases that can be either rounded away from zero or rounded to the >> >> closest even integer according to the hardware. Note this behavior >> >> also existed before the optimized rounding I helped implementing. >> >> Actually there is already a workaround in ITK: >> >> >> >> >> http://www.itk.org/cgibin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55 >> >> >> >> Attached is a patch that fixes this issue by making vnl_math_rnd round >> >> half integers upwards on all platforms. The performance loss is in my >> >> opinion acceptable: >> >> Time for vanilla rnd with halfint round away zero: 643 >> >> Time for vanilla rnd with halfint round up: 832 >> >> Time for lrint: >> >> 321 >> >> Time for sse2 rnd with halfint round to nearest even: 156 >> >> Time for sse2 rnd with halfint round up: 201 >> >> Time for asm rnd with halfint round to nearest even: 175 >> >> Time for asm rnd with halfint round up: 221 >> >> >> >> The other (small) price to pay with this patch is that the optimized >> >> implementations of vnl_math_rnd would only work for numbers whose >> >> absolute value is less that INT_MAX / 2 (same as vnl_math_floor and >> >> vnl_math_ceil). >> >> >> >> The patch also features some small code cleanup and make the test for >> >> vnl_math_rnd more stringent. >> >> >> >> Let me know if you find this patch reasonable. >> >> >> >> Best regards, >> >> Tom Vercauteren >> >> >> >> >> > >> > >> >> >>  >> This SF.net email is sponsored by: >> SourcForge Community >> SourceForge wants to tell your story. >> http://p.sf.net/sfu/sfspreadtheword >> _______________________________________________ >> Vxlmaintainers mailing list >> Vxlmaintainers@... >> https://lists.sourceforge.net/lists/listinfo/vxlmaintainers >> > > 
From: Amitha Perera <amithaperera@us...>  20090113 07:21:31

I suggest that vnl_math should have the following set of functions: vnl_math_rnd: fastest possible rounding. Could be round to nearest or round to even, depending on current FP flag settings. This is the only function for which there are platform specific implementations[*]. Implemented in terms of vnl_math_rnd, the following functions are also provided: vnl_math_rnd_nearest: round to nearest integer, with 0.5 always rounding "up" to the next largest integer. vnl_math_rnd_even: round to nearest even integer, so that 0.5 rounds to 0 and 1.5 rounds to 2. vnl_math_floor: round toward Inf vnl_math_ceil: round toward +Inf In the latter four functions, the admissible range of values may be within INT_MAX/2:INT_MAX/2, because of implementation constraints. [*] If vnl_math_rnd is implemented in "pure C++", using if statements, then we should probably provide "pure C++" implementations of the other functions too, to remove the INT_MAX/2:INT_MAX/2 limitation. Moreover, if there is a magic platform on which more of the conversion functions are implemented in hardware, we should implement all we can using direct asm instructions. Thoughts? Amitha. 
From: Tom Vercauteren <tom.vercauteren@m4...>  20090114 13:13:24

Hi Amitha, Such a solution would be OK for me. I would however use "vnl_math_rnd_halfintup" rather than "vnl_math_rnd_nearest" and "vnl_math_rnd_halfinttoeven" rather than "vnl_math_rnd_even" Also some users might find it confusing to have 3 different implementations of vnl_math_rnd that only differ in how half integers are rounded. One alternative would be to drop "vnl_math_rnd_halfinttoeven" as it can be made redundant with the current "vnl_math_rnd". Indeed, all current optimized implementations will perform round to nearest even on halfintegers*. We could thus only change the "pure c++" implementation of vnl_math_rnd to also round to nearest even for halfintegers. This has the advantage of reduncing the number of quasiredundant functions but has the drawback of slowing down the "pure c++" implementation of vnl_math_rnd. Does this makes sense to you? Tom [*] As previously, this is given the assumptions that FP flag settings have not been changed from the default one or have been restored. P.S.: Thanks to Chuck for the insightful comments On Tue, Jan 13, 2009 at 8:21 AM, Amitha Perera <amithaperera@...> wrote: > I suggest that vnl_math should have the following set of functions: > > vnl_math_rnd: fastest possible rounding. Could be round to nearest or round > to even, depending on current FP flag settings. This is the only function > for which there are platform specific implementations[*]. > > Implemented in terms of vnl_math_rnd, the following functions are also > provided: > > vnl_math_rnd_nearest: round to nearest integer, with 0.5 always rounding > "up" to the next largest integer. > vnl_math_rnd_even: round to nearest even integer, so that 0.5 rounds to 0 > and 1.5 rounds to 2. > vnl_math_floor: round toward Inf > vnl_math_ceil: round toward +Inf > > In the latter four functions, the admissible range of values may be within > INT_MAX/2:INT_MAX/2, because of implementation constraints. > > [*] If vnl_math_rnd is implemented in "pure C++", using if statements, then > we should probably provide "pure C++" implementations of the other functions > too, to remove the INT_MAX/2:INT_MAX/2 limitation. Moreover, if there is a > magic platform on which more of the conversion functions are implemented in > hardware, we should implement all we can using direct asm instructions. > > Thoughts? > > Amitha. > 
From: Amitha Perera <amithaperera@us...>  20090117 14:34:07

Tom Vercauteren wrote: > Such a solution would be OK for me. I would however use > "vnl_math_rnd_halfintup" rather than "vnl_math_rnd_nearest" > and > "vnl_math_rnd_halfinttoeven" rather than "vnl_math_rnd_even" Fine with me. > Also some users might find it confusing to have 3 different > implementations of vnl_math_rnd that only differ in how half integers > are rounded. Well, this cannot be avoided in a general sense, since the implementations are there because there is already an issue about what rounding should mean. I think the only solution is good documentation about which function does what, and what the tradeoffs are, and perhaps a "if you don't know of the issue, use this" type of recommendation. > One alternative would be to drop > "vnl_math_rnd_halfinttoeven" as it can be made redundant with the > current "vnl_math_rnd". That would imply that vnl_math_rnd must always round to even. This is not guaranteed, even with the assembler/SSE implementation, because it is dependent on the rounding mode. (You said the same thing in your message.) Keeping the three means that the semantics of vnl_math_rnd is less well defined (e.g., it could be either to even or to nearest), but is guaranteed to be the fastest possible on the platform. If the semantics matter more than the speed, then use the appropriately named function. Amitha. 
From: Tom Vercauteren <tom.vercauteren@m4...>  20090310 11:14:44
Attachments:
vnl_math_rnd_halfint.patch

Hi Amitha, First of all, sorry for taking so long to get back to you on this rounding issue. Since some time has passed let me recap things a little. There is a consensus that efficient floating point to integer functions is definitely a good thing to have. Most of it is already in vnl_math.h. Round to the nearest integer is somewhat ill defined in the sense that we still need a rule for half integers. Typical rules that might be used for halfintegers are:  round away from zero  round up  round down  round to nearest even These rules can be implemented efficiently provided that the following assumptions are made:  The rounding mode of the FPU is not changed from the default one  For some rules the efficient implementation only works for numbers whose absolute value is less than INT_MAX/2 Without these assumptions, there is no possible efficient (and correct) implementation that I know of. Currently, vnl_math_round is implemented in such a way that it tries to pick the most efficient rule among these ones depending on the platform. Having a implementation that is consistent across platforms might however be of interest, notably for several ITK developers who would like to have the round up version. Following our discussion, I have implemented two additional rounding functions:  vnl_math_rnd_halfinttoeven  vnl_math_rnd_halfintup In the attached patch, vnl_math_rnd is now basically a wrapper around vnl_math_rnd_halfinttoeven except when a vanilla c implementation is used. Similarly vnl_math_rnd_halfintup becomes a wrapper around vnl_math_rnd_halfinttoeven with a multiply and divide by two trick except when a vanilla c implementation is used. The unit test for vnl_math has been expanded accordingly. Let me know, if you think this patch needs more work. Regards, Tom On Sat, Jan 17, 2009 at 15:34, Amitha Perera <amithaperera@...> wrote: > Tom Vercauteren wrote: >> Such a solution would be OK for me. I would however use >> "vnl_math_rnd_halfintup" rather than "vnl_math_rnd_nearest" >> and >> "vnl_math_rnd_halfinttoeven" rather than "vnl_math_rnd_even" > > Fine with me. > >> Also some users might find it confusing to have 3 different >> implementations of vnl_math_rnd that only differ in how half integers >> are rounded. > > Well, this cannot be avoided in a general sense, since the > implementations are there because there is already an issue about what > rounding should mean. I think the only solution is good documentation > about which function does what, and what the tradeoffs are, and perhaps > a "if you don't know of the issue, use this" type of recommendation. > > > One alternative would be to drop >> "vnl_math_rnd_halfinttoeven" as it can be made redundant with the >> current "vnl_math_rnd". > > That would imply that vnl_math_rnd must always round to even. This is > not guaranteed, even with the assembler/SSE implementation, because it > is dependent on the rounding mode. (You said the same thing in your > message.) Keeping the three means that the semantics of vnl_math_rnd is > less well defined (e.g., it could be either to even or to nearest), but > is guaranteed to be the fastest possible on the platform. If the > semantics matter more than the speed, then use the appropriately named > function. > > Amitha. > >  > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sfspreadtheword > _______________________________________________ > Vxlmaintainers mailing list > Vxlmaintainers@... > https://lists.sourceforge.net/lists/listinfo/vxlmaintainers > 