In fact, with msvc and win64, the options to enable "Enhanced Instruction Sets" (/arch:SSE or /arch:SSE2) are not even valid options since they are necessary reguardless. On Mon, Jan 12, 2009 at 2:35 PM, Chuck Atkins <chuck.atkins@...>wrote: > With MSVC and Win64 it's actually a bit more complicated than that. While > the compiler doesn't support inline assembler for Win64, the OS actually has > a problem with the x87 FPU. So even if it did support inline assembler, the > inline instructions would still have to use SSE(1, 2, etc.) instructions > instead x87 FPU instructions. If you look ad the actual instructions > generated by any win64 compiler, not just msvc, you should notice that all > floating point operations are done via SSE. This is not so much as an > optimizations as it is out of necessity. > > On Mon, Jan 12, 2009 at 1:00 PM, Tom Vercauteren <tom.vercauteren@...>wrote: > >> Hi Ian, >> >> Thanks for the feedback. I am not sure to fully understand you though. >> >> I should have said that the results I showed are only valid on my >> machine (32 bits linux using gcc). But indeed the fastest results are >> for "round to even" which is the default rounding mode. In order to >> get something like "round up", I used a multiply and divide by two >> trick which slows down the operation. >> >> On x8664 a fast implementation will be used if either sse2 is turned >> on or gcc is used. Windows 64 bits with MSVC and without sse2 turned >> on cannot benefit from the optimized asm function because Win64 >> doesn't support inline assembler. >> >> Hope this information helps. >> >> Tom >> >> On Mon, Jan 12, 2009 at 6:42 PM, Ian Scott >> <ian.m.scott@...> wrote: >> > Tom, >> > >> > Thankyou for looking into this issue. >> > >> > Wouldn't it be better to choose the fast option on the most common >> > highperformance platform (x8664) as the default rounding operation? >> > >> > If I understand your results, is round to even the fastest? >> > >> > Ian. >> > >> > >> > >> > Tom Vercauteren wrote: >> >> >> >> Dear all, >> >> >> >> Some time ago, I helped implementing a few optimized real to integer >> >> rounding functions in vnl_math.h. >> >> >> >> In the process of trying to update ITK to use the new implementation >> >> of vnl_math_rnd, we had some interesting discussion on the ITK >> >> developers mailing list. See e.g. >> >> >> >> >> http://www.itk.org/mailman/private/insightdevelopers/2009January/011510.html >> >> >> >> What appeared from these discussions is that many ITK developpers find >> >> it disturbing that the current implementation of vnl_math_rnd does not >> >> behave consistently across platforms. The problem stems from halfway >> >> cases that can be either rounded away from zero or rounded to the >> >> closest even integer according to the hardware. Note this behavior >> >> also existed before the optimized rounding I helped implementing. >> >> Actually there is already a workaround in ITK: >> >> >> >> >> http://www.itk.org/cgibin/viewcvs.cgi/Code/Common/itkIndex.h?root=Insight&view=diff&r1=1.54&r2=1.55 >> >> >> >> Attached is a patch that fixes this issue by making vnl_math_rnd round >> >> half integers upwards on all platforms. The performance loss is in my >> >> opinion acceptable: >> >> Time for vanilla rnd with halfint round away zero: 643 >> >> Time for vanilla rnd with halfint round up: 832 >> >> Time for lrint: >> >> 321 >> >> Time for sse2 rnd with halfint round to nearest even: 156 >> >> Time for sse2 rnd with halfint round up: 201 >> >> Time for asm rnd with halfint round to nearest even: 175 >> >> Time for asm rnd with halfint round up: 221 >> >> >> >> The other (small) price to pay with this patch is that the optimized >> >> implementations of vnl_math_rnd would only work for numbers whose >> >> absolute value is less that INT_MAX / 2 (same as vnl_math_floor and >> >> vnl_math_ceil). >> >> >> >> The patch also features some small code cleanup and make the test for >> >> vnl_math_rnd more stringent. >> >> >> >> Let me know if you find this patch reasonable. >> >> >> >> Best regards, >> >> Tom Vercauteren 