From: Alexandre F. <ale...@gm...> - 2010-02-01 13:06:06
|
Hello, One particularly time consuming source of bugs is that of reuse of a freed structure. When "reuse" means rewrite, the heap can get corrupted pretty fast, and when it means only re-read, the effect can be felt *much* later and much more randomly.... A recent sweeping change (by Don IIRC) in that area was resetting the typePtr of freed Tcl-Objs. That's one step in the good direction. However, as bug 2939073 shows: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2939073&group_id=10894 there are numerous other things whose refcount can be botched, and hence lending themselves to reuse-after-free (here, VarInHash structures). While solving that bug, I found it annoying that the effect was so random and so hard to keep in an atomic case. One efficient solution consists of poking "visibly" invalid values in key fields of the structure, just before freeing it: var->objPtr = (Tcl_Obj *) 0xDEADBEEF ; What's nice with the method is that it is much more efficient and readable than sprinkling the code with explicit validity checks (like refcount>0): indeed, the memory access itself will do the test, the result being a SIGBUS or SIGSEGV at the point of illegal reuse. Of course this works only for structures holding fields which are frequently used as pointers. Better this than nothing.... Questions: (1) Am I allowed to commit the following 1-line addition to tclVar.c: @@ -5610,6 +5624,7 @@ FreeVarEntry( if (TclIsVarUndefined(varPtr) && !TclIsVarTraced(varPtr) && (VarHashRefCount(varPtr) == 1)) { + varPtr->value.objPtr=(void *)0xdeadbeef; ckfree((char *) varPtr); } else { VarHashInvalidateEntry(varPtr); Rationale: it makes more reliable any time-bomb-like crash related to nasty reentrant trace issues like 2939073. Allows my fix for this bug to contain a test case. (2) What about reusing the same idea in other areas as well ? (3) What about obj->typePtr itself ? -Alex |
From: Donal K. F. <don...@ma...> - 2010-02-01 13:12:19
Attachments:
donal_k_fellows.vcf
|
On 01/02/2010 13:06, Alexandre Ferrieux wrote: > (1) Am I allowed to commit the following 1-line addition to tclVar.c: > > @@ -5610,6 +5624,7 @@ FreeVarEntry( > > if (TclIsVarUndefined(varPtr)&& !TclIsVarTraced(varPtr) > && (VarHashRefCount(varPtr) == 1)) { > + varPtr->value.objPtr=(void *)0xdeadbeef; > ckfree((char *) varPtr); > } else { > VarHashInvalidateEntry(varPtr); > > Rationale: it makes more reliable any time-bomb-like crash related to > nasty reentrant trace issues like 2939073. Allows my fix for this bug > to contain a test case. I'd be more inclined to put a NULL in there so that we're making the clear statement that this memory cell isn't pointing to anything. YMMV. > (2) What about reusing the same idea in other areas as well ? > (3) What about obj->typePtr itself ? Again, I'd be inclined to go with using NULL. But there's nothing wrong at all with doing this stuff. Well, so long as we make sure all the problems associated with it go away afterwards, but it's for finding those things that we'd consider doing this in the first place. :-) Donal. |
From: Alexandre F. <ale...@gm...> - 2010-02-01 13:41:22
|
On 2/1/10, Donal K. Fellows <don...@ma...> wrote: > On 01/02/2010 13:06, Alexandre Ferrieux wrote: > > > (1) Am I allowed to commit the following 1-line addition to tclVar.c: > > > > @@ -5610,6 +5624,7 @@ FreeVarEntry( > > > > if (TclIsVarUndefined(varPtr)&& !TclIsVarTraced(varPtr) > > && (VarHashRefCount(varPtr) == 1)) { > > + varPtr->value.objPtr=(void *)0xdeadbeef; > > ckfree((char *) varPtr); > > } else { > > VarHashInvalidateEntry(varPtr); > > > > Rationale: it makes more reliable any time-bomb-like crash related to > > nasty reentrant trace issues like 2939073. Allows my fix for this bug > > to contain a test case. > > > > I'd be more inclined to put a NULL in there so that we're making the > clear statement that this memory cell isn't pointing to anything. YMMV. No, specifically, NULL is not invalid in that context. It is just flagging VarIsUndefined, which is a special, but legal, state for array entries as you very well know. Generalizing, the idea is that 0xDEADBEEF is both certainly invalid for all pointer types that are not union members, and easy to spot by naked eye in gdb traces. -Alex |
From: Kevin K. <kev...@gm...> - 2010-02-01 13:40:39
|
Donal K. Fellows wrote: > On 01/02/2010 13:06, Alexandre Ferrieux wrote: >> (1) Am I allowed to commit the following 1-line addition to tclVar.c: >> >> @@ -5610,6 +5624,7 @@ FreeVarEntry( >> >> if (TclIsVarUndefined(varPtr)&& !TclIsVarTraced(varPtr) >> && (VarHashRefCount(varPtr) == 1)) { >> + varPtr->value.objPtr=(void *)0xdeadbeef; >> ckfree((char *) varPtr); >> } else { >> VarHashInvalidateEntry(varPtr); >> >> Rationale: it makes more reliable any time-bomb-like crash related to >> nasty reentrant trace issues like 2939073. Allows my fix for this bug >> to contain a test case. > > I'd be more inclined to put a NULL in there so that we're making the > clear statement that this memory cell isn't pointing to anything. YMMV. Let me make sure we're on the same page. Pointers in unused/freed objects should be set to NULL - the only constant that we know doesn't point to anything. Other data can be set to some arbitrary constant like 0xa1C0ffee or 0xBadDecaf. For speed, the intentional spoliation of freed data should perhaps be conditioned on being in a debug build. And it's a very good idea, go for it! Donal, do you agree? -- 73 de ke9tv/2, Kevin |
From: Alexandre F. <ale...@gm...> - 2010-02-01 13:58:03
|
On 2/1/10, Kevin Kenny <kev...@gm...> wrote: > Donal K. Fellows wrote: > > On 01/02/2010 13:06, Alexandre Ferrieux wrote: > >> (1) Am I allowed to commit the following 1-line addition to tclVar.c: > >> > >> @@ -5610,6 +5624,7 @@ FreeVarEntry( > >> > >> if (TclIsVarUndefined(varPtr)&& !TclIsVarTraced(varPtr) > >> && (VarHashRefCount(varPtr) == 1)) { > >> + varPtr->value.objPtr=(void *)0xdeadbeef; > >> ckfree((char *) varPtr); > >> } else { > >> VarHashInvalidateEntry(varPtr); > >> > >> Rationale: it makes more reliable any time-bomb-like crash related to > >> nasty reentrant trace issues like 2939073. Allows my fix for this bug > >> to contain a test case. > > > > I'd be more inclined to put a NULL in there so that we're making the > > clear statement that this memory cell isn't pointing to anything. YMMV. > > > Let me make sure we're on the same page. Pointers in unused/freed > objects should be set to NULL - the only constant that we know doesn't > point to anything. Yes, but more often than not, instead of blindly trying an indirection, the code uses the special value NULL as a flag (like an undefined array entry pending true eradication), so the value doesn't generate immediate SIGBUS as it should. In addition to VarInHash (or Var), this is also notably true of Tcl_Obj, where "typePtr==NULL" means pure string. -Alex |
From: Donal K. F. <don...@ma...> - 2010-02-01 14:42:15
Attachments:
donal_k_fellows.vcf
|
On 01/02/2010 13:35, Alexandre Ferrieux wrote: > No, specifically, NULL is not invalid in that context. It is just > flagging VarIsUndefined, which is a special, but legal, state for > array entries as you very well know. > > Generalizing, the idea is that 0xDEADBEEF is both certainly invalid > for all pointer types that are not union members, and easy to spot by > naked eye in gdb traces. There's no guarantee that it's distinct from the set of pointers. The only value that's guaranteed to not point to anything is NULL. That said, if you turn on Tcl's memory debugging then a block is filled with 0x61 bytes when it is ckfree()d. That should be quite sufficiently distinctive for finding problems where memory is accessed after it has been disposed. Donal. |
From: Alexandre F. <ale...@gm...> - 2010-02-01 15:19:11
|
On 2/1/10, Donal K. Fellows <don...@ma...> wrote: > On 01/02/2010 13:35, Alexandre Ferrieux wrote: > > > No, specifically, NULL is not invalid in that context. It is just > > flagging VarIsUndefined, which is a special, but legal, state for > > array entries as you very well know. > > > > Generalizing, the idea is that 0xDEADBEEF is both certainly invalid > > for all pointer types that are not union members, and easy to spot by > > naked eye in gdb traces. > > > > There's no guarantee that it's distinct from the set of pointers. The > only value that's guaranteed to not point to anything is NULL. Ah of course, I was focusing on aligned memory access, so any odd value would do. If we want to also catch unaligned (char *), then maybe (void *)(-1) is a better idea. So my proposal becomes: + varPtr->value.objPtr=(void *)(-1); And again, NULL doesn't fulfill the contract, since it most often is not really invalid. > That > said, if you turn on Tcl's memory debugging then a block is filled with > 0x61 bytes when it is ckfree()d. That should be quite sufficiently > distinctive for finding problems where memory is accessed after it has > been disposed. So, to summarize: - you're vetoing my proposal - you're proposing two "alternatives": (a) stick to NULL, which misses the point in not triggering any runtime fault in many areas of the core (b) use mem debug to isolate those bugs, which misses the point in two ways: - many code paths are different under mem debug, so some bugs are not catchable. Also some bugs manifest themselves in production only, where memedebug (or any custom rebuilt Tcl for that matter) is not an potion. - the resulting pointer is 0x61616161 or 0x6161616161616161, which has exactly the same profile as 0xDEADBEEF or (-1) regarding bus errors and segmentation faults. No, actually, (-1) is likely better, since no realistic heap or stack area would land there. (c) be content with remarks like this one, in the test suite: test trace-0.0 {memory corruption in trace (Tcl Bug 484339)} { # You may need Purify or Electric Fence to reliably # see this one fail. Am I summarizing accurately ? -Alex |
From: Joe E. <jen...@fl...> - 2010-02-01 20:01:14
|
Donal K. Fellows wrote: > On 01/02/2010 13:35, Alexandre Ferrieux wrote: > > No, specifically, NULL is not invalid in that context. It is just > > flagging VarIsUndefined, which is a special, but legal, state for > > array entries as you very well know. > > > > Generalizing, the idea is that 0xDEADBEEF is both certainly invalid > > for all pointer types that are not union members, and easy to spot by > > naked eye in gdb traces. > > There's no guarantee that it's distinct from the set of pointers. The > only value that's guaranteed to not point to anything is NULL. Right, there's no guarantee that this does anything useful. Setting 'ptr = (void*)0xDEADBEEF;' yields Undefined Behavior, so anything can happen, including that the program magically works correctly. In practice, however, it's a pretty reliable way to get a SIGBUS. Which is the desired effect in this case. > That > said, if you turn on Tcl's memory debugging then a block is filled with > 0x61 bytes when it is ckfree()d. That should be quite sufficiently > distinctive for finding problems where memory is accessed after it has > been disposed. Yep. Tcl_Obj records aren't passed to ckfree(), though [*], they're stored on a linked list. Alexandre is proposing a similar stomping scheme for them as is done for ckfree()d memory. I don't think it's a bad idea. --Joe English [*] Except when the core is compiled with a certain set of -D flags the #ifdeffery of which I'm not fully able to disentangle. |
From: Donald G P. <don...@ni...> - 2010-02-01 18:32:18
|
Alexandre Ferrieux wrote: > (3) What about obj->typePtr itself ? The reform you are talking about does not apply to objPtr->typePtr. That field must either hold a valid pointer to the Tcl_ObjType struct that governs the current intrep, or NULL to indicate there is no current intrep. No other state is valid. Unless you talking exclusively about those Tcl_Obj structs that have been returned to the free list. For those I don't think I care. -- | Don Porter Mathematical and Computational Sciences Division | | don...@ni... Information Technology Laboratory | | http://math.nist.gov/~DPorter/ NIST | |______________________________________________________________________| |
From: Alexandre F. <ale...@gm...> - 2010-02-01 21:21:43
|
On 2/1/10, Donald G Porter <don...@ni...> wrote: > Alexandre Ferrieux wrote: > > > (3) What about obj->typePtr itself ? > > > > The reform you are talking about does not apply to objPtr->typePtr. > That field must either hold a valid pointer to the Tcl_ObjType struct > that governs the current intrep, or NULL to indicate there is no current > intrep. No other state is valid. Of course: the point is precisely to get an unambiguously invalid state, so that a SIGBUS is obtained as soon as use is attempted from a stale reference. > Unless you talking exclusively about those Tcl_Obj structs that have > been returned to the free list. For those I don't think I care. This scheme applies to any kind of allocation, be it direct ckalloc, or block allocation like Tcl_Obj. So the objects in the free list are one case among many. And I believe all these discarded objects would uniformly benefit from a reliable "reuse detector" costing very little at free time, and nothing at valid use time. -Alex |
From: Alexandre F. <ale...@gm...> - 2010-02-02 12:09:19
|
Hi, For what it's worth, Donal decided to ignore the suggestion in tclVar.c Then let's hope that the remainder is 100% bug-free; otherwise many hours will be wasted in investigating the next occurrence in a production, non-mem-debug system. -Alex |
From: Kevin K. <kev...@gm...> - 2010-02-03 00:08:57
|
Alexandre Ferrieux wrote: > Hi, > > For what it's worth, Donal decided to ignore the suggestion in tclVar.c > Then let's hope that the remainder is 100% bug-free; otherwise many > hours will be wasted in investigating the next occurrence in a > production, non-mem-debug system. > > -Alex Alex, let's not give up so quickly. Donal, Alex is talking about laying a presumably-invalid background in dead memory, for a case where NULL in live memory would have a different meaning. I don't care that much what's in dead memory. If Alex things that 0xDECAFBAD would be a bit pattern that makes his debugging easier, I'm with Joe here. It doesn't matter for correct C code; and it's highly likely to be in the same space as NULL in terms of segv's and bus errors. |
From: Andy G. <unu...@ai...> - 2010-02-03 02:33:33
|
"Kevin Kenny" <kev...@gm...> wrote: > I don't care that much what's in dead memory. If Alex things that > 0xDECAFBAD would be a bit pattern that makes his debugging easier, > I'm with Joe here. It doesn't matter for correct C code; and it's > highly likely to be in the same space as NULL in terms of segv's and > bus errors. If you need the address of memory that's defined to not be readable, writable, or executable, you could use OS-specific calls to get a page that has those properties. On a platform where this feature doesn't exist or is more trouble than it's worth, fall back on 0xDEADBEEF, which is easy to spot and might even generate unaligned memory access errors. -- Andy Goth | http://andy.junkdrome.org./ unununium@{aircanopy.net,openverse.com} |