From: Dominic M. <dma...@ai...> - 2003-04-04 18:53:32
|
Hi, I've been using valgrind for about six months, and it's been wonderful to have. I was spoiled having access to purify on Solaris machines for a while, and missed having something similar on Linux. Many thanks to Julian Seward and everyone else who contributed to its development. I've included a very small program that generates different output when it's run through valgrind. I noticed the error while I was debugging a function to quickly check if an array of single-precision floats has any NaNs in it - it turns out that doing a bitwise test is 2-3x faster than calling the finite() function. Anyway, the error only occurs if you compile it with the options: gcc -O2 -mcpu=pentiumpro -march=pentiumpro However, the exact same error occurs whether I compile the program with gcc 2.96 (RedHat 7.3's version) or gcc 3.2. The correct output of the program is "0.000000". When run under valgrind 1.9.4, it outputs "1.000000". I hope this helps! It's easy enough for me to work around, but I can only guess that this is probably the symptom of a bug that could manifest itself in other ways. - Dominic #include <stdio.h> int main(int argc, char **argv) { union { float a[2]; int b[2]; } u; u.a[0] = 0.0 / 0.0; u.a[1] = ((*u.b & 0x7FC00000) != 0x7FC00000); printf("%f\n", u.a[1]); return 0; } |
From: Julian S. <js...@ac...> - 2003-04-04 19:49:56
|
Hi. Yes, 1.9.4 does have an obscure bug in the handling of floating point conditional code sometimes. I've fixed it in cvs and I hope to get it out to the world by shipping 1.9.5 at the weekend, or soon thereafter. In the meantime I attach a patch with the fix -- easy, you just need to delete two lines of code. It would be good if you could confirm that it works. Thanks for chasing down a small example, even though I didn't use it -- you've no idea how much that helps. Reproducing problems that people report is the #1 problem we have in debugging V; once we reproduce a problem, tracking it down is simple. J Index: coregrind/vg_from_ucode.c =================================================================== RCS file: /cvsroot/valgrind/valgrind/coregrind/vg_from_ucode.c,v retrieving revision 1.41 retrieving revision 1.42 diff -u -3 -p -r1.41 -r1.42 --- coregrind/vg_from_ucode.c 26 Mar 2003 21:08:00 -0000 1.41 +++ coregrind/vg_from_ucode.c 26 Mar 2003 23:43:57 -0000 1.42 @@ -3412,8 +3412,6 @@ static void emitUInstr ( UCodeBlock* cb, case FPU: vg_assert(u->tag1 == Lit16); vg_assert(u->tag2 == NoValue); - if (anyFlagUse ( u )) - emit_get_eflags(); if (!(*fplive)) { emit_get_fpu_state(); *fplive = True; On Friday 04 April 2003 6:57 pm, Dominic Mazzoni wrote: > Hi, > > I've been using valgrind for about six months, and it's been wonderful to > have. I was spoiled having access to purify on Solaris machines for a > while, and missed having something similar on Linux. Many thanks to > Julian Seward and everyone else who contributed to its development. > > I've included a very small program that generates different output when > it's run through valgrind. I noticed the error while I was debugging a > function to quickly check if an array of single-precision floats has any > NaNs in it - it turns out that doing a bitwise test is 2-3x faster than > calling the finite() function. > > Anyway, the error only occurs if you compile it with the options: > > gcc -O2 -mcpu=pentiumpro -march=pentiumpro > > However, the exact same error occurs whether I compile the program with > gcc 2.96 (RedHat 7.3's version) or gcc 3.2. > > The correct output of the program is "0.000000". When run under valgrind > 1.9.4, it outputs "1.000000". > > I hope this helps! It's easy enough for me to work around, but I can only > guess that this is probably the symptom of a bug that could manifest > itself in other ways. > > - Dominic > > #include <stdio.h> > > int main(int argc, char **argv) > { > union { > float a[2]; > int b[2]; > } u; > > u.a[0] = 0.0 / 0.0; > u.a[1] = ((*u.b & 0x7FC00000) != 0x7FC00000); > printf("%f\n", u.a[1]); > > return 0; > } > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: ValueWeb: > Dedicated Hosting for just $79/mo with 500 GB of bandwidth! > No other company gives more support or power for your dedicated server > http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
From: John R. <re...@cs...> - 2003-04-04 20:17:50
|
> -- you've no idea how much that helps. Reproducing problems that people > report is the #1 problem we have in debugging V; once we reproduce a > problem, tracking it down is simple. Is it just reproducing the problem that's hard, or do you mean "reproducing in a reasonable sized program"? If the latter, then there are techniques that might be able to help. They basically perform a space-wise or time-wise binary search in order to narrow down the problem, exploiting the fact that we have a known-correct implementation of an x86. I saw a great implementation of this idea presented by Jim Gray one time. The goal was to flush out bugs in SQL implementations. They would repeatedly generate these massive random queries and feed them to four or five databases until they found a query that did not return identical results across the databases. They then pruned the parse tree that generated the query until they found the smallest query that elicited different answers from different databases, at which point the problem was pretty obvious. Something similar could be done with Valgrind I think. I'm not sure that generating random C or asm programs would work, but maybe it's possible to turn Valgrind on and off during the execution of a program? This would lead to a strategy where we are searching for the briefest application of Valgrind that gives a different result than a native x86. Again, the problem should be obvious at that point. John |
From: Julian S. <js...@ac...> - 2003-04-04 21:06:41
|
On Friday 04 April 2003 8:17 pm, John Regehr wrote: > > -- you've no idea how much that helps. Reproducing problems that people > > report is the #1 problem we have in debugging V; once we reproduce a > > problem, tracking it down is simple. > > Is it just reproducing the problem that's hard, or do you mean > "reproducing in a reasonable sized program"? Reproducing it at all. Quite often we get reports of the form I have a 1/2 million line fortran program for doing geophysics calculations. Under some obscure circumstances, this causes V to bomb out with ... assertion failure. I am running on MutantLinux 12.34.567 (with foobar-1.9 patch) and the code is compiled by ExpensiveRealMoneyCompiler v 41.97. Our code is proprietary, so unfortunately we can't send you the source. Can you help us? and in these circumstances there's practically nothing we can do apart from note the bug and hope that someone finds a more tractable test case for it. Even if we could have the sources, setting up the precise environment to repro it is very time consuming, and we all have day jobs (etc). Interestingly, one solution to the above is for the bug reporter to make me an account on their machine and allow me to ssh in, so I can reproduce the bug in-place. This has proved very effective in the half-dozen or so times I've done it, and I appreciate the trust of those who allow it. I bet not many people can say they have used emacs at a distance of 12000 miles -- the most recent example of this, the bug was is New Zealand, and I'm in the UK. > If the latter, then there are techniques that might be able to help. > They basically perform a space-wise or time-wise binary search in order to > narrow down the problem, exploiting the fact that we have a known-correct > implementation of an x86. Yes, that's how V was debugged in the first place. I knew from the start that making the virtual CPU work properly would be a problem. So a fundamental design decision was that the program, when run on valgrind, had a memory layout which allows switching over to the real CPU at any point. By changing the switchover point, you can do a binary search to find the exact basic block which is being mistranslated. This is controlled by the --stop-after= flag. Without that, V would never have worked. Design for debuggability / verifiability, I say. Automated debugging is the way to go. J |
From: John R. <re...@cs...> - 2003-04-04 21:57:25
|
> Design for debuggability / verifiability, I say. Automated debugging > is the way to go. Yep -- for example about a year ago I was coding a tricky algorithm that happily lended itself to testing vs. a simulator using random inputs. After I finished finding bugs in my implementation there was one more bug left that turned out to be an error in the original specification of the algorithm! Details on page 12 of this paper if anyone is interested :). http://www.cs.utah.edu/flux/papers/spak-flux-tn-02-01/regehr-rtss02.pdf John |
From: Jeremy F. <je...@go...> - 2003-04-04 20:22:28
|
On Fri, 2003-04-04 at 11:58, Julian Seward wrote: > report is the #1 problem we have in debugging V; once we reproduce a > problem, tracking it down is simple. ^ often J |
From: Dominic M. <dma...@ai...> - 2003-04-04 22:18:54
|
On Fri, 4 Apr 2003, Julian Seward wrote: > > Hi. Yes, 1.9.4 does have an obscure bug in the handling of floating > point conditional code sometimes. I've fixed it in cvs and I hope to > get it out to the world by shipping 1.9.5 at the weekend, or soon > thereafter. In the meantime I attach a patch with the fix -- easy, > you just need to delete two lines of code. > > It would be good if you could confirm that it works. Yep, that fixes it. Thanks! > Thanks for chasing down a small example, even though I didn't use it > -- you've no idea how much that helps. Reproducing problems that people > report is the #1 problem we have in debugging V; once we reproduce a > problem, tracking it down is simple. I understand all too well. :) Regards, Dominic > J > > Index: coregrind/vg_from_ucode.c > =================================================================== > RCS file: /cvsroot/valgrind/valgrind/coregrind/vg_from_ucode.c,v > retrieving revision 1.41 > retrieving revision 1.42 > diff -u -3 -p -r1.41 -r1.42 > --- coregrind/vg_from_ucode.c 26 Mar 2003 21:08:00 -0000 1.41 > +++ coregrind/vg_from_ucode.c 26 Mar 2003 23:43:57 -0000 1.42 > @@ -3412,8 +3412,6 @@ static void emitUInstr ( UCodeBlock* cb, > case FPU: > vg_assert(u->tag1 == Lit16); > vg_assert(u->tag2 == NoValue); > - if (anyFlagUse ( u )) > - emit_get_eflags(); > if (!(*fplive)) { > emit_get_fpu_state(); > *fplive = True; > > > On Friday 04 April 2003 6:57 pm, Dominic Mazzoni wrote: > > Hi, > > > > I've been using valgrind for about six months, and it's been wonderful to > > have. I was spoiled having access to purify on Solaris machines for a > > while, and missed having something similar on Linux. Many thanks to > > Julian Seward and everyone else who contributed to its development. > > > > I've included a very small program that generates different output when > > it's run through valgrind. I noticed the error while I was debugging a > > function to quickly check if an array of single-precision floats has any > > NaNs in it - it turns out that doing a bitwise test is 2-3x faster than > > calling the finite() function. > > > > Anyway, the error only occurs if you compile it with the options: > > > > gcc -O2 -mcpu=pentiumpro -march=pentiumpro > > > > However, the exact same error occurs whether I compile the program with > > gcc 2.96 (RedHat 7.3's version) or gcc 3.2. > > > > The correct output of the program is "0.000000". When run under valgrind > > 1.9.4, it outputs "1.000000". > > > > I hope this helps! It's easy enough for me to work around, but I can only > > guess that this is probably the symptom of a bug that could manifest > > itself in other ways. > > > > - Dominic > > > > #include <stdio.h> > > > > int main(int argc, char **argv) > > { > > union { > > float a[2]; > > int b[2]; > > } u; > > > > u.a[0] = 0.0 / 0.0; > > u.a[1] = ((*u.b & 0x7FC00000) != 0x7FC00000); > > printf("%f\n", u.a[1]); > > > > return 0; > > } > > > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: ValueWeb: > > Dedicated Hosting for just $79/mo with 500 GB of bandwidth! > > No other company gives more support or power for your dedicated server > > http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ > > _______________________________________________ > > Valgrind-users mailing list > > Val...@li... > > https://lists.sourceforge.net/lists/listinfo/valgrind-users > > |