|
From: Joost V. <jv...@he...> - 2006-01-01 16:34:15
|
Hi, I'm running our CP2K program under valgrind (memcheck) and I'm finding that some of the numerical results are slightly different between a normal run and a run under valgrind. The difference is very minor (e.g. -0.94268415669060 vs. -0.94268415669059) and could be due to differences in rounding, but is reproducible (and no errors are generated by valgrind). I'm a bit surprised. Is this a known issue ? This is on opteron with valgrind-3.1.0. If this is 'interesting' I can provide the binary and the input needed to reproduce this. The binary is large (20Mb), but the test runs in just 10min. under valgrind. Thanks, Joost VandeVondele |
|
From: Tom H. <to...@co...> - 2006-01-01 17:01:24
|
In message <Pin...@he...>
Joost VandeVondele <jv...@he...> wrote:
> I'm running our CP2K program under valgrind (memcheck) and I'm finding
> that some of the numerical results are slightly different between a normal
> run and a run under valgrind. The difference is very minor (e.g.
> -0.94268415669060 vs. -0.94268415669059) and could be due to differences
> in rounding, but is reproducible (and no errors are generated by
> valgrind). I'm a bit surprised. Is this a known issue ? This is on opteron
> with valgrind-3.1.0.
My guess would be that you are seeing the difference between the 80 bit
internal precision of the x87 floating point unit and the 64 bit precision
of valgrind's emulation of it. See the documentation at:
http://www.valgrind.org/docs/manual/manual-core.html#manual-core.limits
Note that although that documentation implies that you have to be using
long doubles to observe the problem I believe that you can see it with
doubles as well if the code keeps values in registers between operations.
Try compiling your code with -mfpmath=sse (and -msse2 if this is a 32 bit
machine) and see if the two sets of results are then consistent.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2006-01-01 17:02:47
|
> I'm running our CP2K program under valgrind (memcheck) and I'm finding > that some of the numerical results are slightly different between a normal > run and a run under valgrind. The difference is very minor (e.g. > -0.94268415669060 vs. -0.94268415669059) and could be due to differences > in rounding, but is reproducible (and no errors are generated by > valgrind). I'm a bit surprised. Is this a known issue ? Yes. See here for a discussion of FP limitations in Valgrind: http://www.valgrind.org/docs/manual/manual-core.html#manual-core.limits Let us know if these limitations are a problem. J |
|
From: Joost V. <jv...@he...> - 2006-01-01 18:52:03
|
Tom, Julian, thanks for pointing me to the docs... This is not a problem right now (though I noticed because our regression tester flagged these runs as 'wrong'). It just means that we're effectively running on another architecture. It does mean that we would not be able to reproduce a full simulation under valgrind (i.e. to understand what went wrong in a special case) as different numerics would rapidly lead to different numbers of iterations and (chaotic) trajectories (both meaningful runs, but different). However, I can imagine that getting the fine details of every fpu right is not really a priority. Thanks again, Joost |
|
From: Julian S. <js...@ac...> - 2006-01-01 19:13:33
|
> It does mean that we would not be able to reproduce a full simulation > under valgrind (i.e. to understand what went wrong in a special case) as > different numerics would rapidly lead to different numbers of iterations > and (chaotic) trajectories (both meaningful runs, but different). However, > I can imagine that getting the fine details of every fpu right is not > really a priority. The real difficulty is to decide where to place the cost/accuracy tradeoff for very detailed FPU simulation. Sure, it's possible to do a more accurate simulation, but you pay an ever-larger simulation overhead for increased accuracy, and it's not clear what is an appropriate tradeoff. For most of the time, I believe the FP implementation is "accurate enough", although I'd be the first to admit we need more feedback, really. J |
|
From: John R.
|
> The real difficulty is to decide where to place the cost/accuracy > tradeoff for very detailed FPU simulation. Sure, it's possible to > do a more accurate simulation, but you pay an ever-larger simulation > overhead for increased accuracy, and it's not clear what is an appropriate > tradeoff. What _is_ clear is that the confusion, finger-pointing and debate will persist until the overhead (actually the difference in simulation overhead between the opcodes chosen by the compiler [80-bit] and the opcodes substituted by memcheck [64-bit]) is measured numerically, analyzed and documented. Then perhaps there can be an option so that informed users can choose. -- |