In fact, with msvc and win64, the options to enable "Enhanced Instruction Sets" (/arch:SSE or /arch:SSE2) are not even valid options since they are necessary reguardless.

On Mon, Jan 12, 2009 at 2:35 PM, Chuck Atkins <> wrote:
With MSVC and Win64 it's actually a bit more complicated than that.  While the compiler doesn't support inline assembler for Win64, the OS actually has a problem with the x87 FPU.  So even if it did support inline assembler, the inline instructions would still have to use SSE(1, 2, etc.) instructions instead x87 FPU instructions.  If you look ad the actual instructions generated by any win64 compiler, not just msvc, you should notice that all floating point operations are done via SSE.  This is not so much as an optimizations as it is out of necessity.

On Mon, Jan 12, 2009 at 1:00 PM, Tom Vercauteren <> wrote:
Hi Ian,

Thanks for the feedback. I am not sure to fully understand you though.

I should have said that the results I showed are only valid on my
machine (32 bits linux using gcc). But indeed the fastest results are
for "round to even" which is the default rounding mode. In order to
get something like "round up", I used a multiply and divide by two
trick which slows down the operation.

On x86-64 a fast implementation will be used if either sse2 is turned
on or gcc is used. Windows 64 bits with MSVC and without sse2 turned
on cannot benefit from the optimized asm function because Win64
doesn't support inline assembler.

Hope this information helps.


On Mon, Jan 12, 2009 at 6:42 PM, Ian Scott
<> wrote:
> Tom,
> Thank-you for looking into this issue.
> Wouldn't it be better to choose the fast option on the most common
> high-performance platform (x86-64) as the default rounding operation?
> If I understand your results, is round to even the fastest?
> Ian.
> Tom Vercauteren wrote:
>> Dear all,
>> Some time ago, I helped implementing a few optimized real to integer
>> rounding functions in vnl_math.h.
>> In the process of trying to update ITK to use the new implementation
>> of vnl_math_rnd, we had some interesting discussion on the ITK
>> developers mailing list. See e.g.
>> What appeared from these discussions is that many ITK developpers find
>> it disturbing that the current implementation of vnl_math_rnd does not
>> behave consistently across platforms. The problem stems from halfway
>> cases that can be either rounded away from zero or rounded to the
>> closest even integer according to the hardware. Note this behavior
>> also existed before the optimized rounding I helped implementing.
>> Actually there is already a workaround in ITK:
>> Attached is a patch that fixes this issue by making vnl_math_rnd round
>> half integers upwards on all platforms. The performance loss is in my
>> opinion acceptable:
>> Time for vanilla rnd with halfint round away zero:       643
>> Time for vanilla rnd with halfint round up:                  832
>> Time for lrint:
>> 321
>> Time for sse2 rnd with halfint round to nearest even: 156
>> Time for sse2 rnd with halfint round up:                    201
>> Time for asm rnd with halfint round to nearest even:  175
>> Time for asm rnd with halfint round up:                     221
>> The other (small) price to pay with this patch is that the optimized
>> implementations of  vnl_math_rnd would only work for numbers whose
>> absolute value is less that INT_MAX / 2 (same as vnl_math_floor and
>> vnl_math_ceil).
>> The patch also features some small code cleanup and make the test for
>> vnl_math_rnd more stringent.
>> Let me know if you find this patch reasonable.
>> Best regards,
>> Tom Vercauteren

This email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
Vxl-maintainers mailing list