From: SourceForge.net <no...@so...> - 2010-12-31 10:50:41
|
Bugs item #3142026, was opened at 2010-12-22 21:28 Message generated for change (Comment added) made by dkf You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3142026&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 17. Commands I-L Group: current: 8.6b1 Status: Open Resolution: None Priority: 8 Private: No Submitted By: brad harder (bharder) Assigned to: miguel sofer (msofer) Summary: lsort -integer violates memory Initial Comment: lsort -integer violates memory allocation when Tcl is compiled w/ TCL_MEM_DEBUG. This violation was introduced between 20Aug2009 and 21Aug2009 in CVS repo. ---------------------------------------------------------------------- >Comment By: Donal K. Fellows (dkf) Date: 2010-12-31 10:50 Message: Verified that it is fixed. Will consider testing. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2010-12-30 23:11 Message: Fixed in HEAD, commit msg is: * generic/tclExecute.c (GrowEvaluationStack): off-by-one error in sizing the new allocation - was ok in comment but wrong in the code. Triggered by [Bug 3142026] which happened to require exactly one more than what was in existence. Leaving open as 2 things missing: 1. verify if backport is needed 2. devise a test that does not depend on sheer (un)luck ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2010-12-30 18:38 Message: Note that this is a hi-guard failure when deleting an execution stack: the crash has possibly nothing to do with either lsort or stackAlloc, it looks like someone somewhere is writing zeros one byte past the end of the stack. The crash happens (much?) later when the stack is freed and the guard is checked. ---------------------------------------------------------------------- Comment By: Jeffrey Hobbs (hobbs) Date: 2010-12-30 14:56 Message: I actually tried to trigger it under valgrind, but couldn't. Sorry that I didn't stress that both Brad and I used 64-bit before to test. I'm on the road now, so I don't have the other platforms easily at hand to test more variations. I thought it might be related to Jan's recent 64-bit alignment stuff, but apparently Brad did isolate it to this area. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-30 12:17 Message: Not in any config I have; my toolchain is somewhat obsolete now. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2010-12-30 11:01 Message: No 64bit platform handy to repro. Any chance to run under valgrind a -DPURIFY build? It should have symbols but no mem debug ... ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-30 01:35 Message: Actual crash message: hi guard byte 0 is 0x0 hi guard byte 1 is 0x0 hi guard byte 2 is 0x0 hi guard byte 3 is 0x0 hi guard byte 4 is 0x0 hi guard byte 5 is 0x0 hi guard byte 6 is 0x0 hi guard byte 7 is 0x0 total mallocs 68005 total frees 55127 current packets allocated 12878 current bytes allocated 645210 maximum packets allocated 12885 maximum bytes allocated 655885 high guard failed at 100247038, /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c 910 32040 bytes allocated at (/Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c 1110) Memory validation failure ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-30 01:30 Message: (and students of stack traces will see that I have added [time] to it; it appears to be a by-stander in this) ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-30 01:29 Message: On the basis of that, I suspect it's not a problem with [lsort] (which now has very conventional memory handling) but rather elsewhere; as such, the suggested patch is just papering over the real problem, not fixing it. Assigning to Miguel for comment. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-30 01:24 Message: I'm guessing the issue is the need to increase the size of Tcl stack space (which appears to have hit a 32k boundary). ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-30 01:21 Message: fails on iteration 1334; stack trace follows: #0 0x00007fff83b9ff16 in __kill () #1 0x00007fff83c10f6d in abort () #2 0x00000001000ffc9c in Tcl_PanicVA (format=0x100167bd8 "Memory validation failure", argList=0x7fff5fbfeab0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclPanic.c:118 #3 0x00000001000ffd7b in Tcl_Panic (format=0x100167bd8 "Memory validation failure") at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclPanic.c:147 #4 0x0000000100029d2f in ValidateMemory (memHeaderP=0x100247000, file=0x10016c260 "/Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c", line=910, nukeGuards=1) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCkalloc.c:270 #5 0x000000010002a642 in Tcl_DbCkfree (ptr=0x100247038 "8?\"", file=0x10016c260 "/Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c", line=910) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCkalloc.c:612 #6 0x000000010009fc79 in DeleteExecStack (esPtr=0x100247038) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:910 #7 0x000000010009ff91 in GrowEvaluationStack (eePtr=0x1003067b8, growth=4002, move=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:1092 #8 0x00000001000a01a6 in StackAllocWords (interp=0x100809e38, numWords=4002) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:1179 #9 0x00000001000a03d2 in TclStackAlloc (interp=0x100809e38, numBytes=32016) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:1263 #10 0x0000000100039605 in Tcl_LsortObjCmd (clientData=0x0, interp=0x100809e38, objc=3, objv=0x10022d488) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCmdIL.c:3907 #11 0x000000010001d5a2 in NRRunObjProc (data=0x100458390, interp=0x100809e38, result=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4385 #12 0x000000010001d341 in TclNRRunCallbacks (interp=0x100809e38, result=0, rootPtr=0x1003b58a8) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4332 #13 0x000000010001fab9 in TclEvalObjEx (interp=0x100809e38, objPtr=0x6, flags=0, invoker=0x0, word=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5907 #14 0x000000010001fa5d in Tcl_EvalObjEx (interp=0x100809e38, objPtr=0x6, flags=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5888 #15 0x0000000100041fa1 in Tcl_TimeObjCmd (dummy=0x0, interp=0x100809e38, objc=2, objv=0x10022d1b8) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCmdMZ.c:4089 #16 0x000000010001d5a2 in NRRunObjProc (data=0x1003c1d80, interp=0x100809e38, result=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4385 #17 0x000000010001d341 in TclNRRunCallbacks (interp=0x100809e38, result=0, rootPtr=0x0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4332 #18 0x000000010001fab9 in TclEvalObjEx (interp=0x100809e38, objPtr=0x6, flags=131072, invoker=0x0, word=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5907 #19 0x000000010001fa5d in Tcl_EvalObjEx (interp=0x100809e38, objPtr=0x6, flags=131072) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5888 #20 0x00000001000c6ec9 in Tcl_RecordAndEvalObj (interp=0x100809e38, cmdPtr=0x1003a42c8, flags=131072) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclHistory.c:192 #21 0x00000001000f35bf in Tcl_MainEx (argc=-1, argv=0x7fff5fbff608, appInitProc=0x1000029f7 <Tcl_AppInit>, interp=0x100809e38) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclMain.c:518 #22 0x00000001000f3a39 in Tcl_Main (argc=1, argv=0x7fff5fbff600, appInitProc=0x1000029f7 <Tcl_AppInit>) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclMain.c:667 #23 0x00000001000029f0 in main (argc=1, argv=0x7fff5fbff600) at /Users/dkf/Documents/software/tcl8.6-commits/unix/tclAppInit.c:80 ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-30 01:18 Message: Finally managed to reproduce; it's a 64-bit only bug. ---------------------------------------------------------------------- Comment By: brad harder (bharder) Date: 2010-12-27 21:53 Message: Tested, still failed. I've updated the patch that solves this particular ticket to work w/ your latest changes. (patch-aa.v3) ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-27 14:14 Message: Could you retest with the HEAD? I fixed another bug that I found when studying what this could possibly be, so maybe that will resolve this too (as I greatly simplified the memory management logic in the process; the handling of -index was unnecessarily over-clever before). ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-25 19:09 Message: I'd like to emphasize that I don't mean to be difficult. I suspect that there may be some weird problem in the code to handle the temporary memory handling in the lsort command. But I can't reproduce for some reason; I need the stack trace from the crash and can't make it myself. :-( There's too many things that are trickier than they look in this code for me to hunt without at least a copy of a picture of the smoking gun available to me. :-/ ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-25 08:57 Message: I usually build with --enable-symbols=mem (so I can run memleak tests) and wasn't seeing it, and even with --enable-symbols=all in a clean build I still don't see it. ---------------------------------------------------------------------- Comment By: brad harder (bharder) Date: 2010-12-23 19:47 Message: Hi Donal -- indeed, my first patch was poor... I've reviewed and submitted 'patch-aa.v2' which is much more specific in the code it targets (strictly [lsort]) and does pass the case that was previously faulting. This is _not_ to say that [lindex] etc won't be subject to this same class of bug, but I won't speculate with my patch. ---------------------------------------------------------------------- Comment By: Jeffrey Hobbs (hobbs) Date: 2010-12-23 18:41 Message: Donal - you have to compile with --enable-symbols=all (IOW -DTCL_MEM_DEBUG) to ensure the crash. Brad used NetBSD amd64 and I used SuSE Linux-64 to confirm. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-23 12:57 Message: BTW, your patch is definitely not minimal. It touches unrelated code (the bits relating to lines 1312 to 1346 are nothing to do with [lsort]). ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-12-23 12:38 Message: I don't see any problem when building on OSX. Do you have a stack trace? ---------------------------------------------------------------------- Comment By: brad harder (bharder) Date: 2010-12-23 08:22 Message: Can confirm patch applies cleanly against CVS co of 23Dec2010 as well, and that Tcl 8.6 build passes test. ---------------------------------------------------------------------- Comment By: brad harder (bharder) Date: 2010-12-23 08:05 Message: see attached patch-aa for a minimal (absolute minimal?) patch against 14 Dec code that appears to solve this specific issue (as hinted by jeff hobbs). ---------------------------------------------------------------------- Comment By: Jeffrey Hobbs (hobbs) Date: 2010-12-22 21:29 Message: Possibly related to the Tcl_LsortObjCmd change from ckalloc to stackalloc? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3142026&group_id=10894 |