From: Aaron W. LaFramboise <aaron77thyme@aa...>  20090627 19:11:49

Roumen Petrov wrote: > Library mingwex provide only functions for float and long double. double > functions are from Microsoft C runtime. > Could someone confirm that the test use mingwex functions ? They use the mingwex complex routines in the mingwex/complex directory, most (perhaps all) of which were contributed by Danny Smith. Many of those routines are implemented in terms of the routines that you describe, and its possible that poor performance of MSVCRT is part of the problem. Nothing short of a detailed analysis will tell for sure, though. 
From: Stéphane Larouche <stephane.larouche@po...>  20090625 22:19:26

First, I would like to thank everybody that was involved in the porting of GCC 4 to MinGW and in the release of version 4.4.0. About a year and a half ago, I complained that the software I am writing (OpenFilters) ran much slower with GCC 4.3 than with the old GCC 3.4.5 version (see earlier post). At that moment, I did not have the time to look in the matter further to determine the source of the problem. With the release of GCC 4.4.0 for MinGW, I decided to take another look. I now have found that the source of the problem is basic arithmetic operations on complex number. I created simple code (see below) that tests many simple operations on complex numbers. Here is a compilation of typical processor time for 5000000 repetition of these operations (lower is better): 3.4.5 3.4.5(O3) 4.4.0 4.4.0(O3) Time to calculate sum: 125 63 172 78 Time to calculate difference: 109 62 157 62 Time to calculate product: 109 63 297 141 Time to calculate quotient: 188 109 390 234 Time to calculate square root: 1266 313 1548 1688 Time to calculate sin: 2360 2250 2594 2392 Time to calculate cos: 2392 2298 2548 2313 Time to calculate tan: 4892 4548 2610 2454 Time to calculate exp: 1594 1438 1720 1516 Time to calculate log: 1672 1110 2172 2407 Time to calculate real part: 79 47 78 47 Time to calculate imaginary part: 62 47 78 47 Time to calculate norm: 688 203 1423 1438 You can see that they all take longer on version 4.4.0 (with the exception of tan). The product and division, in particular, take two times longer with GCC 4.4.0. Also, the square root, the log, and the norm are not correctly optimized. A google search did not reveal that this is a know problem with GCC 4.3 or 4.4. However, I do not have access to a Linux/Unix/... box with multiple versions of GCC to verify if the problem is specific to MinGW. Do you have any idea of the source of the problem? Is it specific to MinGW. Is there a workaround? Thank you for your help. Sincerely, Stéphane Larouche Here is the code I used for the tests. I compiled without any flags (except for O3 when indicated). I've got runtime 3.15.2, w32api 3.13, and binutils 2.19.1, and I am using Windows XP SP3. I tried both sharedlibgcc and staticlibgcc and have not observed any significant difference. #include <cstdlib> #include <cstdio> #include <cmath> #include <ctime> #include <complex> using namespace std; int main () { long i, nb = 5000000; complex<double> *x, *y, *z; clock_t start, end; x = (complex<double> *)malloc(nb*sizeof(complex<double>)); y = (complex<double> *)malloc(nb*sizeof(complex<double>)); z = (complex<double> *)malloc(nb*sizeof(complex<double>)); start = clock(); for(i = 0; i < nb; i++) x[i] = complex<double>(double(rand())/double(RAND_MAX), double(rand())/double(RAND_MAX)); for(i = 0; i < nb; i++) y[i] = complex<double>(double(rand())/double(RAND_MAX), double(rand())/double(RAND_MAX)); end = clock(); printf("Time to create random variables: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = x[i] + y[i]; end = clock(); printf("Time to calculate sum: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = x[i]  y[i]; end = clock(); printf("Time to calculate difference: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = x[i] * y[i]; end = clock(); printf("Time to calculate product: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = x[i] / y[i]; end = clock(); printf("Time to calculate quotient: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = sqrt(x[i]); end = clock(); printf("Time to calculate square root: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = sin(x[i]); end = clock(); printf("Time to calculate sin: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = cos(x[i]); end = clock(); printf("Time to calculate cos: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = tan(x[i]); end = clock(); printf("Time to calculate tan: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = exp(x[i]); end = clock(); printf("Time to calculate exp: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = log(x[i]); end = clock(); printf("Time to calculate log: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = real(x[i]); end = clock(); printf("Time to calculate real part: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = imag(x[i]); end = clock(); printf("Time to calculate imaginary part: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = abs(x[i]); end = clock(); printf("Time to calculate norm: %i\n", endstart); start = clock(); for(i = 0; i < nb; i++) z[i] = arg(x[i]); end = clock(); printf("Time to calculate argument: %i\n", endstart); } 
From: Aaron W. LaFramboise <aaron77thyme@aa...>  20090626 06:35:06

Stéphane Larouche wrote: > I now have found that the source of the problem is basic arithmetic operations > on complex number. I created simple code (see below) that tests many simple > operations on complex numbers. > You can see that they all take longer on version 4.4.0 (with the exception of > tan). Thanks for creating the detailed testcase for this. OK, I looked into this briefly, and it appears what is happening is that 3.4.5 is using local implementations in libstdc++v3, and 4.4.0 is using the ones in mingwex, and for whatever reason, the mingwex ones are a lot slower. I'm not sure why libstdc++v3 is no longer using its own implementations, but this is really a library issue. Basically, someone just needs to go through and optimize each algorithm. This is really painful, because there's a few good implementations sitting around that we could use, but its unclear if we can due to copyright issues. The normal open source way to get around this is to reimplement it in a different way, but in this case, we really just want the fastest possible code. However, its just the code itself subject to copyright, not the algorithms, so this is really a mess... As a fix, it might be possible to tell libstdc++v3 to use its implementations (if they still exist in 4.4.0) rather than using the library versions, which it probably expects will be faster, not slower. In the long run, we need a few volunteers who are experts in this stuff to just go through and do the work. 
From: Roumen Petrov <bugtrack@ro...>  20090626 19:19:01

Aaron W. LaFramboise wrote: > Stéphane Larouche wrote: > >> I now have found that the source of the problem is basic arithmetic operations >> on complex number. I created simple code (see below) that tests many simple >> operations on complex numbers. > >> You can see that they all take longer on version 4.4.0 (with the exception of >> tan). > > Thanks for creating the detailed testcase for this. > > OK, I looked into this briefly, and it appears what is happening is that > 3.4.5 is using local implementations in libstdc++v3, and 4.4.0 is using > the ones in mingwex, and for whatever reason, the mingwex ones are a lot > slower. Library mingwex provide only functions for float and long double. double functions are from Microsoft C runtime. Could someone confirm that the test use mingwex functions ? [SNIP] Roumen 
From: Aaron W. LaFramboise <aaron77thyme@aa...>  20090627 19:11:49

Roumen Petrov wrote: > Library mingwex provide only functions for float and long double. double > functions are from Microsoft C runtime. > Could someone confirm that the test use mingwex functions ? They use the mingwex complex routines in the mingwex/complex directory, most (perhaps all) of which were contributed by Danny Smith. Many of those routines are implemented in terms of the routines that you describe, and its possible that poor performance of MSVCRT is part of the problem. Nothing short of a detailed analysis will tell for sure, though. 
From: Stéphane Larouche <stephane.larouche@po...>  20090630 03:08:22

Aaron W. LaFramboise <aaron77thyme@...> writes: > Roumen Petrov wrote: > > > Library mingwex provide only functions for float and long double. double > > functions are from Microsoft C runtime. > > Could someone confirm that the test use mingwex functions ? > > They use the mingwex complex routines in the mingwex/complex directory, > most (perhaps all) of which were contributed by Danny Smith. Many of > those routines are implemented in terms of the routines that you > describe, and its possible that poor performance of MSVCRT is part of > the problem. Nothing short of a detailed analysis will tell for sure, > though. Thank you for taking a look at the problem. When I look at the complex header from both the 3.4.5 and 4.4.0 versions, I have the impression that basic arithmetic operations (addition, subtraction, multiplication and division) are directly implemented in the header file, and identically in both cases. However, it is those operations that suffer the largest slowdown (and are the bottleneck in my application). This mystifies me. Any idea what is going on? As for the trigonometric operations and the like, I can try to find more effective mathematical expressions to calculate them, but I don't have the expertise to know how they are compiled and how it affects optimization. Stéphane 
From: Danny Smith <dannysmith@cl...>  20090703 22:43:40

 Original Message  From: "Aaron W. LaFramboise" To: "MinGW Users List" <mingwusers@...> Sent: Sunday, June 28, 2009 7:11 AM Subject: Re: [Mingwusers] Complex number calculations are much slower with version 4.4 > Roumen Petrov wrote: > >> Library mingwex provide only functions for float and long double. double >> functions are from Microsoft C runtime. >> Could someone confirm that the test use mingwex functions ? > > They use the mingwex complex routines in the mingwex/complex directory, > most (perhaps all) of which were contributed by Danny Smith. Many of > those routines are implemented in terms of the routines that you > describe, and its possible that poor performance of MSVCRT is part of > the problem. Nothing short of a detailed analysis will tell for sure, > though. \ The evidence here: 3.4.5 3.4.5(O3) 4.4.0 4.4.0(O3) Time to calculate sum: 125 63 172 78 Time to calculate difference: 109 62 157 62 Time to calculate product: 109 63 297 141 Time to calculate quotient: 188 109 390 234 Time to calculate square root: 1266 313 1548 1688 Time to calculate sin: 2360 2250 2594 2392 Time to calculate cos: 2392 2298 2548 2313 Time to calculate tan: 4892 4548 2610 2454 Time to calculate exp: 1594 1438 1720 1516 Time to calculate log: 1672 1110 2172 2407 Time to calculate real part: 79 47 78 47 Time to calculate imaginary part: 62 47 78 47 Time to calculate norm: 688 203 1423 1438 The three functions with the poorest relative performance are sqrt, log and norm. Interestingly, all three are dependent on complex abs  indeed norm is just abs * abs. In 3.4.5, complex abs was implented as an inline, whereas 4.4.0 uses the c99 library function cabs(), when c99 is available. The mingwex cabs is just a wrapper for _hypot. Now I'm sure somebody could write a faster hypot than the MSVCRT version, but would it have the same precision and overflow protection near the limits? The inline complex abs in 3.4.5 libstdc++, although fast, is not as safe as say the fdlibm hypot. Danny 