|
From: Nicholas N. <nj...@cs...> - 2005-12-24 18:15:43
Attachments:
tinycc.out
eager.diff
|
Hi, In the recent Valgrind survey five people complained about the difficulty of tracking down the root cause of undefined value errors, caused by the fact that Memcheck waits until an undefined value can affect the visible behaviour of the program (eg. is used in a conditional branch, or a syscall input). A couple of people suggested doing more eager checking, and this idea has come up before. The problem is that the copying of undefined values is common, mostly due to the practice of padding structs for alignment and bitfields. I did some experimentation with eager checking a couple of years ago and found that it caused large numbers of false positives. I repeated the experiment again yesterday and saw the same results. I changed Memcheck to complain about the loading of any undefined values and tried various programs. For the empty C program that just returns zero, I get 24 errors from 23 contexts, most just from the dynamic linker. I get the following counts for the following programs: empty 1 errors from 1 context perf/bz2 8405487 errors from 30 contexts perf/tinycc 4647525 errors from 301 contexts I had to use --error-limit=no for these otherwise Memcheck would have stopped reporting errors after 100,000. These programs have no (unsuppressed) errors when run with a normal Memcheck. If I suppress the ones in the dynamic linker, I get: empty 1 errors from 1 context perf/bz2 8405464 errors from 8 contexts perf/tinycc 4647501 errors from 299 contexts If I change things so that any undefined value loaded gets loaded as if it was defined (to avoid possible cascading errors), I get: empty 1 errors from 1 context perf/bz2 4202624 errors from 2 contexts perf/tinycc 1137041 errors from 113 contexts I've attached the output from that last tinycc run. Some extra programs: vim 521 errors from 120 contexts gcc 384 errors from 53 contexts emacs 4876 errors from 63 contexts It has been suggested that an option be present to do this eager checking, but I'm not convinced it would be useful given the overwhelming number of false positives. I'm wondering what other people think. If you want to try this out for yourself, I've attached the patch I used. It's against the COMPVBITS branch, do this to check it out and build: svn co svn://www.valgrind.org/valgrind/branches/COMPVBITS cd COMPVBITS sh ./autogen.sh ./configure --prefix=<...> patch -p0 < eager.diff make Nick |
|
From: John R.
|
Nicholas Nethercote wrote: > In the recent Valgrind survey five people complained about the > difficulty of tracking down the root cause of undefined value errors, > caused by the fact that Memcheck waits until an undefined value can > affect the visible behaviour of the program (eg. is used in a > conditional branch, or a syscall input). [snip] > It has been suggested that an option be present to do this eager > checking, but I'm not convinced it would be useful given the > overwhelming number of false positives. I'm wondering what other people > think. Thank you, Nicholas, for continuing to explore eager undef checking. One of the quality control policies that I deal with demands that an application must never fetch uninitialized bits from memory. This policy increases run-to-run repeatability when "unrelated" logic errors occur. Such repeatability makes maintenance easier, and increases reliability over the software lifecycle. The policy also increases compliance with ISO C 1989, which says that any use of uninitialized bits makes execution totally indeterminate. Language lawyers argue whether "mere fetch" constitutes "use," but an addition operation whose inputs contain uninit bits certainly is a use, even though non-eager memcheck will not complain until the sum affects I/O or flow of control. The policy makes developers aware of alignment holes and padding in structures. Often the response is "memset(&Struct, 0, sizeof(Struct));" shortly after declaration. This can increase runtime efficiency, particularly when the compiler "understands" memset(,0,) and thus elides subsequent "Struct.member<k> = 0;", or when the hardware has special instructions to clear entire cache lines. Of course, it can hurt on processors such as i586 Intel PentiumPlain/MMX, where a write miss does not allocate a cache line. But then memset can insert a fetch, for which i586 does allocate a new cache line upon miss. And memcheck can learn this specific exception, just like memcheck can learn about the intentional fetch overruns in strlen, strcpy, memcpy, etc. If the fundamental low-level language runtime libraries fetch uninit bits in their internal operation, then eager memcheck will notice, and the resulting "noise" will be bothersome. As a contribution towards eliminating this problem, from time to time I have "cleaned up" glibc so that its own internal testcases fetch no uninit bits. See my web page http://BitWagon.com/glibc-audit/glibc-audit.html Just as the first encounter between an application and memcheck often causes dismay, so the first encounter with eager memcheck is likely to be even more daunting. But "thousands" of complaints can be handled with just a few memset(), and this is heartening. Even junior team members can be productive at this stage. Finding alignment holes or padding in an applications's "basic" structs can be a wakeup call for storage efficiency. And if you keep track, within a week or two you might notice that the number and frequency of non-reproducible behaviors is decreasing. Bugs "cascade" less often; the original bug, the first misuse, tends to become visible immediately rather than after infecting several other areas of control or data. -- |
|
From: Ashley P. <as...@qu...> - 2006-01-06 15:42:14
|
On Sat, 2005-12-24 at 12:15 -0600, Nicholas Nethercote wrote: > Hi, > > In the recent Valgrind survey five people complained about the difficulty > of tracking down the root cause of undefined value errors, caused by the > fact that Memcheck waits until an undefined value can affect the visible > behaviour of the program (eg. is used in a conditional branch, or a > syscall input). I've often wondered why Valgrind didn't check at this level, it's what I always thought it should do although I've now come around to the "only report when it has a consequence" model. > I did some experimentation with eager checking a couple of years ago and > found that it caused large numbers of false positives. I repeated the > experiment again yesterday and saw the same results. I changed Memcheck > to complain about the loading of any undefined values and tried various > programs. For the empty C program that just returns zero, I get 24 errors > from 23 contexts, most just from the dynamic linker. I get the following > counts for the following programs: > > empty 1 errors from 1 context > perf/bz2 8405487 errors from 30 contexts > perf/tinycc 4647525 errors from 301 contexts > I had to use --error-limit=no for these otherwise Memcheck would have > stopped reporting errors after 100,000. These programs have no > (unsuppressed) errors when run with a normal Memcheck. Well that's not overly encouraging but the only numbers that actually matter here are the number of contexts, not the number of errors. > I've attached the output from that last tinycc run. What do you get it you use --num-callers=4?. Does this affect how unique contexts are collated or just how they are displayed. It looks to me like you would only need a small number of suppressions to reduce this error count to almost zero. > It has been suggested that an option be present to do this eager checking, > but I'm not convinced it would be useful given the overwhelming number of > false positives. I'm wondering what other people think. I'd like to see it as an option, huge numbers of false positives can be daunting when you first see them but that's in itself is not a good reason for not having this option. One problem of the current scheme is it can be difficult to find the underlying cause of a undefined value, I'd have thought having this as an option would allow people to pinpoint errors reported by the current scheme much quicker and I'd like to see it for that reason alone. Ashley, |