From: Bill P. <pa...@ki...> - 2010-06-25 15:49:57
|
Hi Max, Thanks for the information. You have probably uncovered a bug in mesa, now we just need to track it down! The likely cause is the use of an uninitialized variable that gets a value from whatever was left on the stack from previous calls. With a single thread, the bogus value is always the same, but with multithreads it can be different from one run to the next producing inconsistent results. Would you like to track it down? It is an interesting exercise (if you are a bit masochistic!). The search is done by selectively commenting out the OpenMP parallel directives for loops until you find one that seems to be responsible. Usually this turns up a place where there is an uninitialized variable that has escaped the compiler tests. However I have run into cases where it was a compiler problem -- that might be the case here since you don't see anything with ifort. In that case, we can try to extract a simple test case so the compiler team can look for the problem. And of course we can look for a way to rewrite the code that dodges the compiler bug while we wait for a fix. If you are sensible, you will of course turn down this offer! But if you would like to give it a shot, let me know and I'll be happy to provide coaching. ; - ) Thanks, Bill On Jun 25, 2010, at 8:28 AM, Max Katz wrote: > On one computer I had the Intel compiler, version 11.1; when I ran the test, I got the same results every time, regardless of how many cores were available for use (I tried 1, 2 and 4). However, when I ran the test on the machine using gfortran 4.5.1, I got inconsistent results. Each entry in the table lists the model number (how many timesteps it took to get to the peak eps_nuc level), the age of the star at the peak generation rate, and what that rate actually was. As you can see, I got different results for different numbers of threads, and even found inconsistency while holding the number of threads constant. This seems to indicate that there is some arithmetic precision issue involved that relates either to gfortran itself, or its interaction with OpenMP. > > I am not sure exactly what has caused the issue, but I thought I should at least warn other users of this issue; it is not a startling amount of error, but it is certainly something of which to take note. |