|
From: Nicholas N. <nj...@cs...> - 2005-07-02 19:18:41
|
On Thu, 9 Jun 2005, Scott Long wrote: > So is there any comment on my idea, here? Let each register track the last > memory location it was loaded from, and report that location if the > uninitialized register is used. > > If there is something fundamental that makes that idea unfeasible I'd be very > interested to hear what it is. As far as I can tell, it's a good solution > which would avoid false positives and would probably be trivial to implement. I've given this some more thought. It's not trivial, for two reasons: - Every register now has to be shadowed by two extra values, one holding the V bits for Memcheck's validity checking, and the other holding the address. And every operation has to be shadowed by an extra operation that propagates the addresses to the result. The extra shadow operation would not be trivial -- eg. if you add two values A and B, and only one B is undefined, you'd need to propagate the address shadowing B to the result. If A and B were both defined or both undefined, you could propagate either. So, every register value now requires two shadow values, and every operations requires two shadow operations. This would require a non-trivial amount of effort to code up, and would have some runtime cost. - Converting the address to something more useful is not easy in general. It is easy when the address is within a heap block -- we can use the existing machinery to say "address X is 100 bytes into a 200 byte heap block allocated at Y". But if the address is on the stack or a static value, just saying "address X on the stack/in static memory" is not very helpful. Valgrind does have some support for identifying variables, but it only works with stabs debugging information. It's also possible that most undefined values are loaded from heap blocks, in which case bad messages about stack and static values aren't very important. [I was also wondering if it would be better to record the address of the instruction that does the loading, rather than the address from which the value was loaded. That way the messages would say "undefined value X used on line Y, and it was loaded from memory on line Z.] So it's not that straightforward. But it's certainly possible, and it would be very interesting to see how difficult it is, what the performance effect is, and if the resulting messages are genuinely helpful. (Perhaps I should scream "no! it will never work, fool!" in order to motivate someone into implementing it to prove me wrong :) ---- Having written all that... When diagnosing undefined value errors, you don't really want to know the address that the value was loaded from. Sometimes that will tell you, eg. if you malloc some memory, forget to initialise it, then load it and use it straight-away in a dangerous way -- the address mentioned by the error message will be that of the heap block that was the root cause of the problem. But if you copy that value first to the stack, then load it and use it dangerously, the address mentioned by the error message would be the stack address, which is not the root cause of the problem. Every undefined value was caused to be undefined by one or more events of undefined value creation. Eg. undefined value A came into being as part of a heap block allocated by malloc(). If A is added to a defined value B, the result would be an undefined value C. C's why-am-I-undefined event is still the same as A's, however. In the case where you have Z = X + Y and both X and Y are undefined, you can arbitrarily choose X's event or Y's event to become Z's event. (If the programmer fixed one and reran Memcheck, the other undefined event would then rear its head, if you see what I mean.) So rather than passing around extra memory addresses, it would be better to pass around "tokens" that record the event that caused the undefined value to come into existence. What are these events? Well, allocating a heap block with malloc() is an obvious one. Shrinking the stack passed some previously defined values is another one. You would shadow all undefined values with one of these tokens, and you'd propagate them in the (hopefully) obvious ways. One key idea here, which Scott pointed out, is that rather than having to record the full history of each undefined value, we only need to record one single undefined-value-causing event, even if the undefined value can be traced back to more than one undefined-value event. That's enough to give a better error message. The problem with this "token" idea is that I'm assuming every value can be shadowed with a token, including values in memory. If a token was the size of a pointer, this would blow out Memcheck's memory requirements too much. Scott's second key idea is that you can approximate this by recording the extra "token" only for registers. So tokens have possibly incomplete histories, since the passing of values through memory breaks that history. But perhaps undefined values don't pass in and out of memory very much, or perhaps the truncated history recorded from the most recent load is enough in a lot of cases. A third key idea is that this extra "token" only needs to be recorded for values which are undefined. So there's some potential for compression of representation there. Anyway, I've probably lost everyone by now, but I've come to appreciate some of the subtleties of this problem more now. Thanks, Scott :) N |