From: Julian S. <js...@ac...> - 2007-12-04 02:36:28
|
On Monday 03 December 2007 20:29, Dave Nomura wrote: > So you tracked down these unitialized values down to the strxxx > functions defined in ld.so and Valgrind normally intercepts these calls > because Memcheck can't handle the sorts of code that is generated for > these routines? Correct. > Is it possible to teach Memcheck to deal with these optimizations? > > Steve Munroe, the author of those optimized strxxx functions, tells me > that the kinds of optimizations done for these routines are going to > start appearing in other library routines, and possibly in generated > object code so the problem is going to become more pervasive. You're in the land of difficult tradeoffs. A lot of effort has already been applied here. All these optimised, vectorised (effectively) string ops rely on two techniques: (1) using properties of carry-chain propagation in addition/subtraction so as find out whether any byte in a word is zero, and if so which one (2) reading (traditional C-style zero-terminated) strings using aligned word reads, rather than byte reads (1) fools Memcheck's normal handling of definedness tracking for adds/subtracts, causing it to believe the result of the add/subtract is completely undefined, when it isn't really. In fact Memcheck can and sometimes does generate a more exact interpretation, which does handle this case correctly. The problem is deciding when to apply it. The standard analysis costs about 3 insns in the generated code, and the exact analysis more than 10 insns (+ more registers). Applying the expensive case throughout would cause significant slowdowns to the 99.99% of code fragments for which the standard handling is perfectly adequate. (2) causes Memcheck to report invalid address errors for the partial word loads covering the zero terminating bytes at the end of strings. You can stop it complaining about this by giving --partial-loads-ok=yes, but that could cause genuine errors to be missed. Said flag is not enabled by default. I realise that (2) is "perfectly safe" in that the word-sized loads are naturally aligned and so cannot possibly cause any page faults that would not otherwise occur. Nevertheless, any way you slice it, ISO C/C++ says that reading memory outside of allocated blocks counts as undefined behaviour (IIUC), and that's precisely what Memcheck aims to report. We have never claimed that Memcheck is suitable for code compiled at -O2 and above. -O is the max recommended level. I would advocate the following: * do not allow gcc to inline stringops at -O, only at -O2 and above * do not strip all symbol names off ld.so In short there's a conflict between optimising the hell out of stringops and having enough visibility for reliable debugging. Given the above constraints I don't see how you can have your cake and eat it. Note that none of the above is PPC specific -- it also applies to x86/amd64. I'm not sure why these problems appear more acute on ppc -- it may be some interaction between the carry chain propagation games and the fact that ppc is bigendian. J |