|
From: Patrick J. L. <lop...@gm...> - 2012-02-17 17:21:26
|
Hello again, Valgrind community. I have filed the following bug report: https://bugs.kde.org/show_bug.cgi?id=294285 (By the way, "3.8.0SVN" is not presently an option for the "Version" drop-down when filing a bug.) Background: I am using Valgrind on optimized SSE4.2 output from the Intel C Compiler. I am seeing massive amounts of "Invalid read of size 8" false positives. I believe these false positives would disappear if "--partial-loads-ok=yes" worked as documented. The problem is that it does not work for 16-byte SSE loads; Valgrind's behavior appears to be to treat such loads as a pair of 8-byte loads. Note that GCC has started to emit similar code (https://bugs.kde.org/show_bug.cgi?id=294285), and I suspect it will only become more common as compilers get smarter. Questions: 1) Am I correct that this is a bug? 2) How hard would it be to fix? Specifically, is this something a skilled C programmer with no prior knowledge of Valgrind might be able to tackle in a reasonable timeframe? :-) Thanks. - Pat P.S. In my opinion, the documentation for "--partial-loads-ok" is overly discouraging (http://valgrind.org/docs/manual/mc-manual.html#opt.partial-loads-ok). Valgrind operates on machine code, not C/C++ code, so references to the C/C++ standards are not relevant. Aligned loads simply cannot generate a fault halfway through, and such loads are standard practice in optimized code. As long as the uninitialized portion of the word goes unused, there is no problem in theory nor practice. I would even argue that "--partial-loads-ok=yes" should be the default... P.P.S. Thank you for Valgrind; it is a tremendous tool. |
|
From: John R. <jr...@bi...> - 2012-02-18 06:07:25
|
> https://bugs.kde.org/show_bug.cgi?id=294285 > 1) Am I correct that this is a bug? It seems to me that there may be at least one set of cases where the memcheck complaint might be correct, namely if there is no '\0' in an allocated block whose length is not a multiple of 16. [Example: blk = malloc(11); allocator delivers block of 16 bytes; memcpy(blk, "123456789AB", 11); if (x < strlen(blk)) ...;] Then pcmpeqb examines blk[11] that is beyond the malloc()ed length. That's a true error unless it can be proved that the uninit portion of the pcmpeqb result (the 0xF800 bits of the mask) never is used. Proving that the pmovmskb and the bsf do not depend on the uninit bits is hard. It's harder than proving the correctness of the "super-optimized" code for strlen() that uses general purpose registers and constants such as 0x01010101 and 0xfefefeff. memcheck currently cannot grok such general-register code, and developers have resisted employing heuristics such as looking for immediate constants 0x01010101, etc. The workarounds are to demand interception of many str*() and mem*() routines, to avoid generating inline str*(), and to write explicit suppressions. > 2) How hard would it be to fix? Specifically, is this something a > skilled C programmer with no prior knowledge of Valgrind might be able > to tackle in a reasonable timeframe? :-) It's exceptionally difficult, and essentially impossible given the current aversion to dealing properly with data-dependent uninits. The sequence pcmpeqb;pmovmskb;bsf is an idiom that can be detected in nearly all cases, so the complaint could be auto-intercepted "on demand." In particular, I have posted code that heuristically detects inlined strlen() on demand for traditional $ARCH such as PowerPC and x86, but the patch was ignored. [Note that memcheck doesn't even grok data-dependent (non-)propagation of Carry bits in two's complement integer addition! memcheck says: "any uninit input to ADD ==> all output bits of same or higher place value ("to the left") are uninit. This is pessimistic because it ignores the possibility of '0' bits in matching positions of the inputs, which stops the propagation of uninit Carry.] -- |
|
From: Julian S. <js...@ac...> - 2012-02-18 22:52:40
|
> Background: I am using Valgrind on optimized SSE4.2 output from the > Intel C Compiler. I am seeing massive amounts of "Invalid read of > size 8" false positives. I believe these false positives would > disappear if "--partial-loads-ok=yes" worked as documented. The > problem is that it does not work for 16-byte SSE loads; Valgrind's > behavior appears to be to treat such loads as a pair of 8-byte loads. > > Note that GCC has started to emit similar code > (https://bugs.kde.org/show_bug.cgi?id=294285), and I suspect it will > only become more common as compilers get smarter. For whatever it's worth .. The documentation never claimed that Memcheck will work sanely for ultra optimised code. In fact it recommends restricting yourself to gcc -g -O1, which works fairly well in practice as a tradeoff between performance and debuggability. That said, I did do some work to make --partial-loads-ok=yes work correctly for 16 byte SSE loads. But it never got committed, and is still sitting in some tree around here somewhere. The main problem, as you surmise, is that such loads get treated as two 8 byte loads, and so there's no easy way for --partial-loads-ok=yes to do the right thing. I think the hack I did makes it possible for mc_main.c to know whether an 8 byte load is really "itself", or whether it's half of a 16 byte load (and in that case, which half). I don't really remember though. I'd have to look over the diff again. J |
|
From: Patrick J. L. <lop...@gm...> - 2012-02-20 17:37:31
|
John Reiser writes: > It seems to me that there may be at least one set of cases where > the memcheck complaint might be correct, namely if there is no '\0' > in an allocated block whose length is not a multiple of 16. If there is no `\0` in your string and you call strlen(), I agree we need to issue an error! But I am not suggesting suppressing errors via any kind of heuristic. I am suggesting improving the machine model in order to reduce false positives. > [Example: blk = malloc(11); allocator delivers block of > 16 bytes; memcpy(blk, "123456789AB", 11); if (x < strlen(blk)) ...;] > Then pcmpeqb examines blk[11] that is beyond the malloc()ed length. > That's a true error unless it can be proved that the uninit > portion of the pcmpeqb result (the 0xF800 bits of the mask) > never is used. Proving that the pmovmskb and the bsf do not > depend on the uninit bits is hard. Perhaps I am missing something, but I do not see why this is hard. Memcheck already tracks validity bits. When partial_loads_ok is true, we just need to mark the bits read from beyond the end of the block as "undefined", because that is what they are. True, valgrind does not currently allow memcheck to propagate those validity bits for pmovmskb, but that is a different bug and fairly simple to fix: The validity bits of the result of pmovmskb inherit the validity of the bits that got moved, just like any time bits get moved. Handling bsf is then trivial, and it might even work already: You do a bsf on the shadow register; compare to the bsf result on the data; and if the former is smaller than the latter you set the result's validity bits to "undefined". (The idea is that if the first-set-bit comes earlier than the first-undefined-bit, the result is defined; else it is not.) This would handle the SSE-optimized strlen() and, I suspect, many many other cases besides. And it would generate an error for the case you describe. (The error would be "use of uninitialized data" rather than "read beyond end of block", but this strikes me as reasonable when you set --partial-loads-ok=yes.) I bet the pcmpXXX + pmovmskb + bsf is a common idiom. The final result is that an error will be issued precisely when something actually depends on the data read from outside the allocated block, and not otherwise. I have already created a patch to implement partial_loads_ok for 16-byte SSE loads, including setting the validity bits correctly. I hope to clean it up and attach it to the bug later today. Thanks! - Pat |