Thread: [Vxl-maintainers] vnl_math_round: consistency of half-way cases | The VXL Project

vxl-maintainers

[Vxl-maintainers] vnl_math_round: consistency of half-way cases

From: Tom V. <tom...@gm...> - 2009-01-12 17:25:26

Attachments: vnl_math_round-round_up.patch

Dear all,

Some time ago, I helped implementing a few optimized real to integer
rounding functions in vnl_math.h.

In the process of trying to update ITK to use the new implementation
of vnl_math_rnd, we had some interesting discussion on the ITK
developers mailing list. See e.g.
http://www.itk.org/mailman/private/insight-developers/2009-January/011510.html

What appeared from these discussions is that many ITK developpers find
it disturbing that the current implementation of vnl_math_rnd does not
behave consistently across platforms. The problem stems from halfway
cases that can be either rounded away from zero or rounded to the
closest even integer according to the hardware. Note this behavior
also existed before the optimized rounding I helped implementing.
Actually there is already a workaround in ITK:
http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55

Attached is a patch that fixes this issue by making vnl_math_rnd round
half integers upwards on all platforms. The performance loss is in my
opinion acceptable:
Time for vanilla rnd with halfint round away zero: 643
Time for vanilla rnd with halfint round up: 832
Time for lrint: 321
Time for sse2 rnd with halfint round to nearest even: 156
Time for sse2 rnd with halfint round up: 201
Time for asm rnd with halfint round to nearest even: 175
Time for asm rnd with halfint round up: 221

The other (small) price to pay with this patch is that the optimized
implementations of vnl_math_rnd would only work for numbers whose
absolute value is less that INT_MAX / 2 (same as vnl_math_floor and
vnl_math_ceil).

The patch also features some small code cleanup and make the test for
vnl_math_rnd more stringent.

Let me know if you find this patch reasonable.

Best regards,
Tom Vercauteren

Re: [Vxl-maintainers] vnl_math_round: consistency of half-way cases

From: Ian S. <ian...@st...> - 2009-01-12 17:42:42

Tom,

Thank-you for looking into this issue.

Wouldn't it be better to choose the fast option on the most common 
high-performance platform (x86-64) as the default rounding operation?

If I understand your results, is round to even the fastest?

Ian.



Tom Vercauteren wrote:
> Dear all,
> 
> Some time ago, I helped implementing a few optimized real to integer
> rounding functions in vnl_math.h.
> 
> In the process of trying to update ITK to use the new implementation
> of vnl_math_rnd, we had some interesting discussion on the ITK
> developers mailing list. See e.g.
> http://www.itk.org/mailman/private/insight-developers/2009-January/011510.html
> 
> What appeared from these discussions is that many ITK developpers find
> it disturbing that the current implementation of vnl_math_rnd does not
> behave consistently across platforms. The problem stems from halfway
> cases that can be either rounded away from zero or rounded to the
> closest even integer according to the hardware. Note this behavior
> also existed before the optimized rounding I helped implementing.
> Actually there is already a workaround in ITK:
> http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55
> 
> Attached is a patch that fixes this issue by making vnl_math_rnd round
> half integers upwards on all platforms. The performance loss is in my
> opinion acceptable:
> Time for vanilla rnd with halfint round away zero:       643
> Time for vanilla rnd with halfint round up:                  832
> Time for lrint:                                                         321
> Time for sse2 rnd with halfint round to nearest even: 156
> Time for sse2 rnd with halfint round up:                    201
> Time for asm rnd with halfint round to nearest even:  175
> Time for asm rnd with halfint round up:                     221
> 
> The other (small) price to pay with this patch is that the optimized
> implementations of  vnl_math_rnd would only work for numbers whose
> absolute value is less that INT_MAX / 2 (same as vnl_math_floor and
> vnl_math_ceil).
> 
> The patch also features some small code cleanup and make the test for
> vnl_math_rnd more stringent.
> 
> Let me know if you find this patch reasonable.
> 
> Best regards,
> Tom Vercauteren
> 
>

Re: [Vxl-maintainers] vnl_math_round: consistency of half-way cases

From: Tom V. <tom...@m4...> - 2009-01-12 18:00:31

Hi Ian,

Thanks for the feedback. I am not sure to fully understand you though.

I should have said that the results I showed are only valid on my
machine (32 bits linux using gcc). But indeed the fastest results are
for "round to even" which is the default rounding mode. In order to
get something like "round up", I used a multiply and divide by two
trick which slows down the operation.

On x86-64 a fast implementation will be used if either sse2 is turned
on or gcc is used. Windows 64 bits with MSVC and without sse2 turned
on cannot benefit from the optimized asm function because Win64
doesn't support inline assembler.

Hope this information helps.

Tom

On Mon, Jan 12, 2009 at 6:42 PM, Ian Scott
<ian...@st...> wrote:
> Tom,
>
> Thank-you for looking into this issue.
>
> Wouldn't it be better to choose the fast option on the most common
> high-performance platform (x86-64) as the default rounding operation?
>
> If I understand your results, is round to even the fastest?
>
> Ian.
>
>
>
> Tom Vercauteren wrote:
>>
>> Dear all,
>>
>> Some time ago, I helped implementing a few optimized real to integer
>> rounding functions in vnl_math.h.
>>
>> In the process of trying to update ITK to use the new implementation
>> of vnl_math_rnd, we had some interesting discussion on the ITK
>> developers mailing list. See e.g.
>>
>> http://www.itk.org/mailman/private/insight-developers/2009-January/011510.html
>>
>> What appeared from these discussions is that many ITK developpers find
>> it disturbing that the current implementation of vnl_math_rnd does not
>> behave consistently across platforms. The problem stems from halfway
>> cases that can be either rounded away from zero or rounded to the
>> closest even integer according to the hardware. Note this behavior
>> also existed before the optimized rounding I helped implementing.
>> Actually there is already a workaround in ITK:
>>
>> http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55
>>
>> Attached is a patch that fixes this issue by making vnl_math_rnd round
>> half integers upwards on all platforms. The performance loss is in my
>> opinion acceptable:
>> Time for vanilla rnd with halfint round away zero:       643
>> Time for vanilla rnd with halfint round up:                  832
>> Time for lrint:
>> 321
>> Time for sse2 rnd with halfint round to nearest even: 156
>> Time for sse2 rnd with halfint round up:                    201
>> Time for asm rnd with halfint round to nearest even:  175
>> Time for asm rnd with halfint round up:                     221
>>
>> The other (small) price to pay with this patch is that the optimized
>> implementations of  vnl_math_rnd would only work for numbers whose
>> absolute value is less that INT_MAX / 2 (same as vnl_math_floor and
>> vnl_math_ceil).
>>
>> The patch also features some small code cleanup and make the test for
>> vnl_math_rnd more stringent.
>>
>> Let me know if you find this patch reasonable.
>>
>> Best regards,
>> Tom Vercauteren
>>
>>
>
>

Re: [Vxl-maintainers] vnl_math_round: consistency of half-way cases

From: Chuck A. <chu...@ki...> - 2009-01-12 20:06:40

With MSVC and Win64 it's actually a bit more complicated than that.  While
the compiler doesn't support inline assembler for Win64, the OS actually has
a problem with the x87 FPU.  So even if it did support inline assembler, the
inline instructions would still have to use SSE(1, 2, etc.) instructions
instead x87 FPU instructions.  If you look ad the actual instructions
generated by any win64 compiler, not just msvc, you should notice that all
floating point operations are done via SSE.  This is not so much as an
optimizations as it is out of necessity.

On Mon, Jan 12, 2009 at 1:00 PM, Tom Vercauteren <tom...@m4...>wrote:

> Hi Ian,
>
> Thanks for the feedback. I am not sure to fully understand you though.
>
> I should have said that the results I showed are only valid on my
> machine (32 bits linux using gcc). But indeed the fastest results are
> for "round to even" which is the default rounding mode. In order to
> get something like "round up", I used a multiply and divide by two
> trick which slows down the operation.
>
> On x86-64 a fast implementation will be used if either sse2 is turned
> on or gcc is used. Windows 64 bits with MSVC and without sse2 turned
> on cannot benefit from the optimized asm function because Win64
> doesn't support inline assembler.
>
> Hope this information helps.
>
> Tom
>
> On Mon, Jan 12, 2009 at 6:42 PM, Ian Scott
> <ian...@st...> wrote:
> > Tom,
> >
> > Thank-you for looking into this issue.
> >
> > Wouldn't it be better to choose the fast option on the most common
> > high-performance platform (x86-64) as the default rounding operation?
> >
> > If I understand your results, is round to even the fastest?
> >
> > Ian.
> >
> >
> >
> > Tom Vercauteren wrote:
> >>
> >> Dear all,
> >>
> >> Some time ago, I helped implementing a few optimized real to integer
> >> rounding functions in vnl_math.h.
> >>
> >> In the process of trying to update ITK to use the new implementation
> >> of vnl_math_rnd, we had some interesting discussion on the ITK
> >> developers mailing list. See e.g.
> >>
> >>
> http://www.itk.org/mailman/private/insight-developers/2009-January/011510.html
> >>
> >> What appeared from these discussions is that many ITK developpers find
> >> it disturbing that the current implementation of vnl_math_rnd does not
> >> behave consistently across platforms. The problem stems from halfway
> >> cases that can be either rounded away from zero or rounded to the
> >> closest even integer according to the hardware. Note this behavior
> >> also existed before the optimized rounding I helped implementing.
> >> Actually there is already a workaround in ITK:
> >>
> >>
> http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55
> >>
> >> Attached is a patch that fixes this issue by making vnl_math_rnd round
> >> half integers upwards on all platforms. The performance loss is in my
> >> opinion acceptable:
> >> Time for vanilla rnd with halfint round away zero:       643
> >> Time for vanilla rnd with halfint round up:                  832
> >> Time for lrint:
> >> 321
> >> Time for sse2 rnd with halfint round to nearest even: 156
> >> Time for sse2 rnd with halfint round up:                    201
> >> Time for asm rnd with halfint round to nearest even:  175
> >> Time for asm rnd with halfint round up:                     221
> >>
> >> The other (small) price to pay with this patch is that the optimized
> >> implementations of  vnl_math_rnd would only work for numbers whose
> >> absolute value is less that INT_MAX / 2 (same as vnl_math_floor and
> >> vnl_math_ceil).
> >>
> >> The patch also features some small code cleanup and make the test for
> >> vnl_math_rnd more stringent.
> >>
> >> Let me know if you find this patch reasonable.
> >>
> >> Best regards,
> >> Tom Vercauteren
> >>
> >>
> >
> >
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> Vxl-maintainers mailing list
> Vxl...@li...
> https://lists.sourceforge.net/lists/listinfo/vxl-maintainers
>

Re: [Vxl-maintainers] vnl_math_round: consistency of half-way cases

From: Chuck A. <chu...@ki...> - 2009-01-12 20:02:55

In fact, with msvc and win64, the options to enable "Enhanced Instruction
Sets" (/arch:SSE or /arch:SSE2) are not even valid options since they are
necessary reguardless.

On Mon, Jan 12, 2009 at 2:35 PM, Chuck Atkins <chu...@ki...>wrote:

> With MSVC and Win64 it's actually a bit more complicated than that.  While
> the compiler doesn't support inline assembler for Win64, the OS actually has
> a problem with the x87 FPU.  So even if it did support inline assembler, the
> inline instructions would still have to use SSE(1, 2, etc.) instructions
> instead x87 FPU instructions.  If you look ad the actual instructions
> generated by any win64 compiler, not just msvc, you should notice that all
> floating point operations are done via SSE.  This is not so much as an
> optimizations as it is out of necessity.
>
> On Mon, Jan 12, 2009 at 1:00 PM, Tom Vercauteren <tom...@m4...>wrote:
>
>> Hi Ian,
>>
>> Thanks for the feedback. I am not sure to fully understand you though.
>>
>> I should have said that the results I showed are only valid on my
>> machine (32 bits linux using gcc). But indeed the fastest results are
>> for "round to even" which is the default rounding mode. In order to
>> get something like "round up", I used a multiply and divide by two
>> trick which slows down the operation.
>>
>> On x86-64 a fast implementation will be used if either sse2 is turned
>> on or gcc is used. Windows 64 bits with MSVC and without sse2 turned
>> on cannot benefit from the optimized asm function because Win64
>> doesn't support inline assembler.
>>
>> Hope this information helps.
>>
>> Tom
>>
>> On Mon, Jan 12, 2009 at 6:42 PM, Ian Scott
>> <ian...@st...> wrote:
>> > Tom,
>> >
>> > Thank-you for looking into this issue.
>> >
>> > Wouldn't it be better to choose the fast option on the most common
>> > high-performance platform (x86-64) as the default rounding operation?
>> >
>> > If I understand your results, is round to even the fastest?
>> >
>> > Ian.
>> >
>> >
>> >
>> > Tom Vercauteren wrote:
>> >>
>> >> Dear all,
>> >>
>> >> Some time ago, I helped implementing a few optimized real to integer
>> >> rounding functions in vnl_math.h.
>> >>
>> >> In the process of trying to update ITK to use the new implementation
>> >> of vnl_math_rnd, we had some interesting discussion on the ITK
>> >> developers mailing list. See e.g.
>> >>
>> >>
>> http://www.itk.org/mailman/private/insight-developers/2009-January/011510.html
>> >>
>> >> What appeared from these discussions is that many ITK developpers find
>> >> it disturbing that the current implementation of vnl_math_rnd does not
>> >> behave consistently across platforms. The problem stems from halfway
>> >> cases that can be either rounded away from zero or rounded to the
>> >> closest even integer according to the hardware. Note this behavior
>> >> also existed before the optimized rounding I helped implementing.
>> >> Actually there is already a workaround in ITK:
>> >>
>> >>
>> http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55
>> >>
>> >> Attached is a patch that fixes this issue by making vnl_math_rnd round
>> >> half integers upwards on all platforms. The performance loss is in my
>> >> opinion acceptable:
>> >> Time for vanilla rnd with halfint round away zero:       643
>> >> Time for vanilla rnd with halfint round up:                  832
>> >> Time for lrint:
>> >> 321
>> >> Time for sse2 rnd with halfint round to nearest even: 156
>> >> Time for sse2 rnd with halfint round up:                    201
>> >> Time for asm rnd with halfint round to nearest even:  175
>> >> Time for asm rnd with halfint round up:                     221
>> >>
>> >> The other (small) price to pay with this patch is that the optimized
>> >> implementations of  vnl_math_rnd would only work for numbers whose
>> >> absolute value is less that INT_MAX / 2 (same as vnl_math_floor and
>> >> vnl_math_ceil).
>> >>
>> >> The patch also features some small code cleanup and make the test for
>> >> vnl_math_rnd more stringent.
>> >>
>> >> Let me know if you find this patch reasonable.
>> >>
>> >> Best regards,
>> >> Tom Vercauteren
>> >>
>> >>
>> >
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by:
>> SourcForge Community
>> SourceForge wants to tell your story.
>> http://p.sf.net/sfu/sf-spreadtheword
>> _______________________________________________
>> Vxl-maintainers mailing list
>> Vxl...@li...
>> https://lists.sourceforge.net/lists/listinfo/vxl-maintainers
>>
>
>

Re: [Vxl-maintainers] vnl_math_round: consistency of half-way cases

From: Amitha P. <ami...@us...> - 2009-01-13 07:21:31

I suggest that vnl_math should have the following set of functions:

vnl_math_rnd: fastest possible rounding.  Could be round to nearest or 
round to even, depending on current FP flag settings.  This is the only 
function for which there are platform specific implementations[*].

Implemented in terms of vnl_math_rnd, the following functions are also 
provided:

vnl_math_rnd_nearest: round to nearest integer, with 0.5 always rounding 
"up" to the next largest integer.
vnl_math_rnd_even: round to nearest even integer, so that 0.5 rounds to 
0 and 1.5 rounds to 2.
vnl_math_floor: round toward -Inf
vnl_math_ceil: round toward +Inf

In the latter four functions, the admissible range of values may be 
within -INT_MAX/2:INT_MAX/2, because of implementation constraints.

[*] If vnl_math_rnd is implemented in "pure C++", using if statements, 
then we should probably provide "pure C++" implementations of the other 
functions too, to remove the -INT_MAX/2:INT_MAX/2 limitation.  Moreover, 
if there is a magic platform on which more of the conversion functions 
are implemented in hardware, we should implement all we can using direct 
asm instructions.

Thoughts?

Amitha.

Re: [Vxl-maintainers] vnl_math_round: consistency of half-way cases

From: Tom V. <tom...@m4...> - 2009-01-14 13:13:24

Hi Amitha,

Such a solution would be OK for me. I would however use
  "vnl_math_rnd_halfintup" rather than "vnl_math_rnd_nearest"
and
  "vnl_math_rnd_halfinttoeven" rather than "vnl_math_rnd_even"

Also some users might find it confusing to have 3 different
implementations of vnl_math_rnd that only differ in how half integers
are rounded. One alternative would be to drop
"vnl_math_rnd_halfinttoeven" as it can be made redundant with the
current "vnl_math_rnd".

Indeed, all current optimized implementations will perform round to
nearest even on halfintegers*. We could thus only change the "pure
c++" implementation of vnl_math_rnd to also round to nearest even for
halfintegers.

This has the advantage of reduncing the number of quasi-redundant
functions but has the drawback of slowing down the  "pure c++"
implementation of vnl_math_rnd.

Does this makes sense to you?

Tom

[*] As previously, this is given the assumptions that FP flag settings
have not been changed from the default one or have been restored.

P.S.: Thanks to Chuck for the insightful comments

On Tue, Jan 13, 2009 at 8:21 AM, Amitha Perera
<ami...@us...> wrote:
> I suggest that vnl_math should have the following set of functions:
>
> vnl_math_rnd: fastest possible rounding.  Could be round to nearest or round
> to even, depending on current FP flag settings.  This is the only function
> for which there are platform specific implementations[*].
>
> Implemented in terms of vnl_math_rnd, the following functions are also
> provided:
>
> vnl_math_rnd_nearest: round to nearest integer, with 0.5 always rounding
> "up" to the next largest integer.
> vnl_math_rnd_even: round to nearest even integer, so that 0.5 rounds to 0
> and 1.5 rounds to 2.
> vnl_math_floor: round toward -Inf
> vnl_math_ceil: round toward +Inf
>
> In the latter four functions, the admissible range of values may be within
> -INT_MAX/2:INT_MAX/2, because of implementation constraints.
>
> [*] If vnl_math_rnd is implemented in "pure C++", using if statements, then
> we should probably provide "pure C++" implementations of the other functions
> too, to remove the -INT_MAX/2:INT_MAX/2 limitation.  Moreover, if there is a
> magic platform on which more of the conversion functions are implemented in
> hardware, we should implement all we can using direct asm instructions.
>
> Thoughts?
>
> Amitha.
>

Re: [Vxl-maintainers] vnl_math_round: consistency of half-way cases

From: Amitha P. <ami...@us...> - 2009-01-17 14:34:07

Tom Vercauteren wrote:
> Such a solution would be OK for me. I would however use
>   "vnl_math_rnd_halfintup" rather than "vnl_math_rnd_nearest"
> and
>   "vnl_math_rnd_halfinttoeven" rather than "vnl_math_rnd_even"

Fine with me.

> Also some users might find it confusing to have 3 different
> implementations of vnl_math_rnd that only differ in how half integers
> are rounded.

Well, this cannot be avoided in a general sense, since the 
implementations are there because there is already an issue about what 
rounding should mean.  I think the only solution is good documentation 
about which function does what, and what the tradeoffs are, and perhaps 
a "if you don't know of the issue, use this" type of recommendation.

 > One alternative would be to drop
> "vnl_math_rnd_halfinttoeven" as it can be made redundant with the
> current "vnl_math_rnd".

That would imply that vnl_math_rnd must always round to even.  This is 
not guaranteed, even with the assembler/SSE implementation, because it 
is dependent on the rounding mode.  (You said the same thing in your 
message.)  Keeping the three means that the semantics of vnl_math_rnd is 
less well defined (e.g., it could be either to even or to nearest), but 
is guaranteed to be the fastest possible on the platform.  If the 
semantics matter more than the speed, then use the appropriately named 
function.

Amitha.

Re: [Vxl-maintainers] vnl_math_round: consistency of half-way cases

From: Tom V. <tom...@m4...> - 2009-03-10 11:14:44

Attachments: vnl_math_rnd_halfint.patch

Hi Amitha,

First of all, sorry for taking so long to get back to you on this
rounding issue.

Since some time has passed let me recap things a little. There is a
consensus that efficient floating point to integer functions is
definitely a good thing to have. Most of it is already in vnl_math.h.
Round to the nearest integer is somewhat ill defined in the sense that
we still need a rule for half integers. Typical rules that might be
used for half-integers are:
- round away from zero
- round up
- round down
- round to nearest even

These rules can be implemented efficiently provided that the following
assumptions are made:
- The rounding mode of the FPU is not changed from the default one
- For some rules the efficient implementation only works for  numbers
whose absolute value is less than INT_MAX/2

Without these assumptions, there is no possible efficient (and
correct) implementation that I know of.

Currently, vnl_math_round is implemented in such a way that it tries
to pick the most efficient rule among these ones depending on the
platform. Having a implementation that is consistent across platforms
might however be of interest, notably for several ITK developers who
would like to have the round up version.

Following our discussion, I have implemented two additional rounding functions:
- vnl_math_rnd_halfinttoeven
- vnl_math_rnd_halfintup

In the attached patch, vnl_math_rnd is now basically a wrapper around
vnl_math_rnd_halfinttoeven except when a vanilla c implementation is
used.

Similarly vnl_math_rnd_halfintup becomes a wrapper around
vnl_math_rnd_halfinttoeven with a multiply and divide by two trick
except when a vanilla c implementation is used.

The unit test for vnl_math has been expanded accordingly.

Let me know, if you think this patch needs more work.

Regards,
Tom


On Sat, Jan 17, 2009 at 15:34, Amitha Perera
<ami...@us...> wrote:
> Tom Vercauteren wrote:
>> Such a solution would be OK for me. I would however use
>>   "vnl_math_rnd_halfintup" rather than "vnl_math_rnd_nearest"
>> and
>>   "vnl_math_rnd_halfinttoeven" rather than "vnl_math_rnd_even"
>
> Fine with me.
>
>> Also some users might find it confusing to have 3 different
>> implementations of vnl_math_rnd that only differ in how half integers
>> are rounded.
>
> Well, this cannot be avoided in a general sense, since the
> implementations are there because there is already an issue about what
> rounding should mean.  I think the only solution is good documentation
> about which function does what, and what the tradeoffs are, and perhaps
> a "if you don't know of the issue, use this" type of recommendation.
>
>  > One alternative would be to drop
>> "vnl_math_rnd_halfinttoeven" as it can be made redundant with the
>> current "vnl_math_rnd".
>
> That would imply that vnl_math_rnd must always round to even.  This is
> not guaranteed, even with the assembler/SSE implementation, because it
> is dependent on the rounding mode.  (You said the same thing in your
> message.)  Keeping the three means that the semantics of vnl_math_rnd is
> less well defined (e.g., it could be either to even or to nearest), but
> is guaranteed to be the fastest possible on the platform.  If the
> semantics matter more than the speed, then use the appropriately named
> function.
>
> Amitha.
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> Vxl-maintainers mailing list
> Vxl...@li...
> https://lists.sourceforge.net/lists/listinfo/vxl-maintainers
>