I not an expert in this topic at all, but I found the thread you mentioned as well and was a bit confused. I have nothing of content to state on that topic, but provide some measurements and a slight note of
hmmm, I think you either misinterpret your results or you switched the lines of your output:
Tell me if I am wrong, but according to you code above, the first 2 lines of output should be for pow (fast) and the second ones for ldexp (slow).
Anyway, here are my results for comparison (Ubuntu 11.04 natty 64 bit with g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 )
I swapped the lines tic (toc) with memory allocation (std::cout), because we do not want to measure those effects, do we?
Elapsed time = 5.0531 seconds
pow res=4.41353e+18
Elapsed time = 0.77639 seconds
ldexp res=4.41353e+18
/Stephan
PS: code for reference, compiled with
g++ itpp-config --cflags ldexp_test.cpp -o ldexp_test itpp-config --libs
MSVC run-time uses divide-conquer algorithm with multiplications to compute integer powers (log2 N multiplications are used). I do not know what happens inside the microsoft implementation of ldexp
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I feel we need more testing. Can anyone run the benchmark with Stephan's modifications and report the results? Both Linux and windows results are highly appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
These are my results (openSUSE 12.2, x86_64 with ACML)
pow
Elapsed time = 6.87597 seconds
res=4.41353e+18
ldexp
Elapsed time = 0.98343 seconds
res=4.41353e+18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And on Windows 8 x86_64 (Visual Studio 2010, ACML, Release mode, x64)
pow
Elapsed time = 0.352236 seconds
res=4.41353e+18
ldexp
Elapsed time = 1.3519 seconds
res=4.41353e+18
Althow in Debug mode I get faster times with ldexp.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the results. I also mentioned that ldexp is faster in Debug mode. Based on the test results (thank you, Stephan!), linux users should benefit from switching to ldexp, but we should stick with current implementation on windows. I can implement it compiler-dependent way with ifdefs inside the pow2 implementation. I can proceed and provide a patch if no one has any objections.
Andy
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Why do not we use ldexp in pow2 implementation? It should be faster and
more accurate solution then std::pow.
Last edit: andy_panov 2013-01-31
Please provide some results (expecially proving that your implementation is faster).
I've tried to run the following code on my machine with MSVC:
~~~~~~~~~~~~~~~~~~~~
include <cmath>
include <vector>
include <iostream>
include "itpp/base/timing.h"
using namespace itpp;
int main()
{
double res;
itpp::tic();
std::vector<double> pow_out(500);
for (int j = 0 ; j < 500; ++j)
{
res = 0.0;
for(int i = 0, k = 0; i < 100000; ++i, ++k)
{
if (k > 50) k = 0;
res += pow(2.0, k);
}
pow_out[j] = res;
}
std::cout<<"res="<<pow_out[499]<<std::endl;
itpp::toc_print();
}
~~~~~~~~~~~~~~~~
and was quite surprised with results:
res=4.41353e+018
Elapsed time = 0.858002 seconds
res=4.41353e+018
Elapsed time = 2.6364 seconds
It means that several multiplications of doubles are 3 times faster then just tweaking of double exponent value!
Sure, I was wrong with my initial statement.
PS someone in the following thread http://stackoverflow.com/questions/7720668/fast-multiplication-division-by-2-for-floats-and-doubles-c-c indicates that VC11 vectorizes loops with doubles using SSE2, so others can obtain opposite results with compilers still using FPU for things like that.
Hi andy_panov,
I not an expert in this topic at all, but I found the thread you mentioned as well and was a bit confused. I have nothing of content to state on that topic, but provide some measurements and a slight note of
hmmm, I think you either misinterpret your results or you switched the lines of your output:
Tell me if I am wrong, but according to you code above, the first 2 lines of output should be for pow (fast) and the second ones for ldexp (slow).
Anyway, here are my results for comparison (Ubuntu 11.04 natty 64 bit with g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 )
I swapped the lines tic (toc) with memory allocation (std::cout), because we do not want to measure those effects, do we?
Elapsed time = 5.0531 seconds
pow res=4.41353e+18
Elapsed time = 0.77639 seconds
ldexp res=4.41353e+18
/Stephan
PS: code for reference, compiled with
g++
itpp-config --cflags
ldexp_test.cpp -o ldexp_testitpp-config --libs
results were for a Intel(R) Core(TM) i3 CPU M 350 @ 2.27GHz
further results i7-2600 @3.4GHz, Ubuntu 11.10 oneiric 64 bit with g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1:
Elapsed time = 2.91351 seconds
pow res=4.41353e+18
Elapsed time = 0.496346 seconds
ldexp res=4.41353e+18
Hi Stephan,
still have following results with your ldexp_test.cpp (msvc, vc11):
Elapsed time = 0.889201 seconds
pow res=4.41353e+018
Elapsed time = 2.6364 seconds
ldexp res=4.41353e+018
Allocation should not affect the whole picture much since it is done only once.
I do not have the explanations. I also confused with it.
Last edit: andy_panov 2013-02-01
MSVC run-time uses divide-conquer algorithm with multiplications to compute integer powers (log2 N multiplications are used). I do not know what happens inside the microsoft implementation of ldexp
I feel we need more testing. Can anyone run the benchmark with Stephan's modifications and report the results? Both Linux and windows results are highly appreciated.
These are my results (openSUSE 12.2, x86_64 with ACML)
pow
Elapsed time = 6.87597 seconds
res=4.41353e+18
ldexp
Elapsed time = 0.98343 seconds
res=4.41353e+18
And on Windows 8 x86_64 (Visual Studio 2010, ACML, Release mode, x64)
pow
Elapsed time = 0.352236 seconds
res=4.41353e+18
ldexp
Elapsed time = 1.3519 seconds
res=4.41353e+18
Althow in Debug mode I get faster times with ldexp.
Hi Bogdan,
Thank you for the results. I also mentioned that ldexp is faster in Debug mode. Based on the test results (thank you, Stephan!), linux users should benefit from switching to ldexp, but we should stick with current implementation on windows. I can implement it compiler-dependent way with ifdefs inside the pow2 implementation. I can proceed and provide a patch if no one has any objections.
Andy