lsort -integer violates memory allocation when Tcl is compiled w/ TCL_MEM_DEBUG. This violation was introduced between 20Aug2009 and 21Aug2009 in CVS repo.
Donal - you have to compile with --enable-symbols=all (IOW -DTCL_MEM_DEBUG) to ensure the crash. Brad used NetBSD amd64 and I used SuSE Linux-64 to confirm.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Donal -- indeed, my first patch was poor... I've reviewed and submitted 'patch-aa.v2' which is much more specific in the code it targets (strictly [lsort]) and does pass the case that was previously faulting. This is _not_ to say that [lindex] etc won't be subject to this same class of bug, but I won't speculate with my patch.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I usually build with --enable-symbols=mem (so I can run memleak tests) and wasn't seeing it, and even with --enable-symbols=all in a clean build I still don't see it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'd like to emphasize that I don't mean to be difficult. I suspect that there may be some weird problem in the code to handle the temporary memory handling in the lsort command. But I can't reproduce for some reason; I need the stack trace from the crash and can't make it myself. :-(
There's too many things that are trickier than they look in this code for me to hunt without at least a copy of a picture of the smoking gun available to me. :-/
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could you retest with the HEAD? I fixed another bug that I found when studying what this could possibly be, so maybe that will resolve this too (as I greatly simplified the memory management logic in the process; the handling of -index was unnecessarily over-clever before).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
#0 0x00007fff83b9ff16 in __kill ()
#1 0x00007fff83c10f6d in abort ()
#2 0x00000001000ffc9c in Tcl_PanicVA (format=0x100167bd8 "Memory validation failure", argList=0x7fff5fbfeab0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclPanic.c:118
#3 0x00000001000ffd7b in Tcl_Panic (format=0x100167bd8 "Memory validation failure") at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclPanic.c:147
#4 0x0000000100029d2f in ValidateMemory (memHeaderP=0x100247000, file=0x10016c260 "/Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c", line=910, nukeGuards=1) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCkalloc.c:270
#5 0x000000010002a642 in Tcl_DbCkfree (ptr=0x100247038 "8?\"", file=0x10016c260 "/Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c", line=910) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCkalloc.c:612
#6 0x000000010009fc79 in DeleteExecStack (esPtr=0x100247038) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:910
#7 0x000000010009ff91 in GrowEvaluationStack (eePtr=0x1003067b8, growth=4002, move=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:1092
#8 0x00000001000a01a6 in StackAllocWords (interp=0x100809e38, numWords=4002) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:1179
#9 0x00000001000a03d2 in TclStackAlloc (interp=0x100809e38, numBytes=32016) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:1263
#10 0x0000000100039605 in Tcl_LsortObjCmd (clientData=0x0, interp=0x100809e38, objc=3, objv=0x10022d488) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCmdIL.c:3907
#11 0x000000010001d5a2 in NRRunObjProc (data=0x100458390, interp=0x100809e38, result=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4385
#12 0x000000010001d341 in TclNRRunCallbacks (interp=0x100809e38, result=0, rootPtr=0x1003b58a8) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4332
#13 0x000000010001fab9 in TclEvalObjEx (interp=0x100809e38, objPtr=0x6, flags=0, invoker=0x0, word=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5907
#14 0x000000010001fa5d in Tcl_EvalObjEx (interp=0x100809e38, objPtr=0x6, flags=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5888
#15 0x0000000100041fa1 in Tcl_TimeObjCmd (dummy=0x0, interp=0x100809e38, objc=2, objv=0x10022d1b8) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCmdMZ.c:4089
#16 0x000000010001d5a2 in NRRunObjProc (data=0x1003c1d80, interp=0x100809e38, result=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4385
#17 0x000000010001d341 in TclNRRunCallbacks (interp=0x100809e38, result=0, rootPtr=0x0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4332
#18 0x000000010001fab9 in TclEvalObjEx (interp=0x100809e38, objPtr=0x6, flags=131072, invoker=0x0, word=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5907
#19 0x000000010001fa5d in Tcl_EvalObjEx (interp=0x100809e38, objPtr=0x6, flags=131072) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5888
#20 0x00000001000c6ec9 in Tcl_RecordAndEvalObj (interp=0x100809e38, cmdPtr=0x1003a42c8, flags=131072) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclHistory.c:192
#21 0x00000001000f35bf in Tcl_MainEx (argc=-1, argv=0x7fff5fbff608, appInitProc=0x1000029f7 <Tcl_AppInit>, interp=0x100809e38) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclMain.c:518
#22 0x00000001000f3a39 in Tcl_Main (argc=1, argv=0x7fff5fbff600, appInitProc=0x1000029f7 <Tcl_AppInit>) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclMain.c:667
#23 0x00000001000029f0 in main (argc=1, argv=0x7fff5fbff600) at /Users/dkf/Documents/software/tcl8.6-commits/unix/tclAppInit.c:80
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
On the basis of that, I suspect it's not a problem with [lsort] (which now has very conventional memory handling) but rather elsewhere; as such, the suggested patch is just papering over the real problem, not fixing it. Assigning to Miguel for comment.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi guard byte 0 is 0x0
hi guard byte 1 is 0x0
hi guard byte 2 is 0x0
hi guard byte 3 is 0x0
hi guard byte 4 is 0x0
hi guard byte 5 is 0x0
hi guard byte 6 is 0x0
hi guard byte 7 is 0x0
total mallocs 68005
total frees 55127
current packets allocated 12878
current bytes allocated 645210
maximum packets allocated 12885
maximum bytes allocated 655885
high guard failed at 100247038, /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c 910
32040 bytes allocated at (/Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c 1110)
Memory validation failure
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Example code that will trigger exception
Possibly related to the Tcl_LsortObjCmd change from ckalloc to stackalloc?
see attached patch-aa for a minimal (absolute minimal?) patch against 14 Dec code that appears to solve this specific issue (as hinted by jeff hobbs).
Can confirm patch applies cleanly against CVS co of 23Dec2010 as well, and that Tcl 8.6 build passes test.
I don't see any problem when building on OSX. Do you have a stack trace?
BTW, your patch is definitely not minimal. It touches unrelated code (the bits relating to lines 1312 to 1346 are nothing to do with [lsort]).
Donal - you have to compile with --enable-symbols=all (IOW -DTCL_MEM_DEBUG) to ensure the crash. Brad used NetBSD amd64 and I used SuSE Linux-64 to confirm.
Hi Donal -- indeed, my first patch was poor... I've reviewed and submitted 'patch-aa.v2' which is much more specific in the code it targets (strictly [lsort]) and does pass the case that was previously faulting. This is _not_ to say that [lindex] etc won't be subject to this same class of bug, but I won't speculate with my patch.
I usually build with --enable-symbols=mem (so I can run memleak tests) and wasn't seeing it, and even with --enable-symbols=all in a clean build I still don't see it.
I'd like to emphasize that I don't mean to be difficult. I suspect that there may be some weird problem in the code to handle the temporary memory handling in the lsort command. But I can't reproduce for some reason; I need the stack trace from the crash and can't make it myself. :-(
There's too many things that are trickier than they look in this code for me to hunt without at least a copy of a picture of the smoking gun available to me. :-/
Could you retest with the HEAD? I fixed another bug that I found when studying what this could possibly be, so maybe that will resolve this too (as I greatly simplified the memory management logic in the process; the handling of -index was unnecessarily over-clever before).
Tested, still failed. I've updated the patch that solves this particular ticket to work w/ your latest changes. (patch-aa.v3)
applies against CVS head of 27Dec, 2010
Finally managed to reproduce; it's a 64-bit only bug.
fails on iteration 1334; stack trace follows:
#0 0x00007fff83b9ff16 in __kill ()
#1 0x00007fff83c10f6d in abort ()
#2 0x00000001000ffc9c in Tcl_PanicVA (format=0x100167bd8 "Memory validation failure", argList=0x7fff5fbfeab0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclPanic.c:118
#3 0x00000001000ffd7b in Tcl_Panic (format=0x100167bd8 "Memory validation failure") at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclPanic.c:147
#4 0x0000000100029d2f in ValidateMemory (memHeaderP=0x100247000, file=0x10016c260 "/Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c", line=910, nukeGuards=1) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCkalloc.c:270
#5 0x000000010002a642 in Tcl_DbCkfree (ptr=0x100247038 "8?\"", file=0x10016c260 "/Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c", line=910) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCkalloc.c:612
#6 0x000000010009fc79 in DeleteExecStack (esPtr=0x100247038) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:910
#7 0x000000010009ff91 in GrowEvaluationStack (eePtr=0x1003067b8, growth=4002, move=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:1092
#8 0x00000001000a01a6 in StackAllocWords (interp=0x100809e38, numWords=4002) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:1179
#9 0x00000001000a03d2 in TclStackAlloc (interp=0x100809e38, numBytes=32016) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c:1263
#10 0x0000000100039605 in Tcl_LsortObjCmd (clientData=0x0, interp=0x100809e38, objc=3, objv=0x10022d488) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCmdIL.c:3907
#11 0x000000010001d5a2 in NRRunObjProc (data=0x100458390, interp=0x100809e38, result=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4385
#12 0x000000010001d341 in TclNRRunCallbacks (interp=0x100809e38, result=0, rootPtr=0x1003b58a8) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4332
#13 0x000000010001fab9 in TclEvalObjEx (interp=0x100809e38, objPtr=0x6, flags=0, invoker=0x0, word=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5907
#14 0x000000010001fa5d in Tcl_EvalObjEx (interp=0x100809e38, objPtr=0x6, flags=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5888
#15 0x0000000100041fa1 in Tcl_TimeObjCmd (dummy=0x0, interp=0x100809e38, objc=2, objv=0x10022d1b8) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclCmdMZ.c:4089
#16 0x000000010001d5a2 in NRRunObjProc (data=0x1003c1d80, interp=0x100809e38, result=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4385
#17 0x000000010001d341 in TclNRRunCallbacks (interp=0x100809e38, result=0, rootPtr=0x0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:4332
#18 0x000000010001fab9 in TclEvalObjEx (interp=0x100809e38, objPtr=0x6, flags=131072, invoker=0x0, word=0) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5907
#19 0x000000010001fa5d in Tcl_EvalObjEx (interp=0x100809e38, objPtr=0x6, flags=131072) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclBasic.c:5888
#20 0x00000001000c6ec9 in Tcl_RecordAndEvalObj (interp=0x100809e38, cmdPtr=0x1003a42c8, flags=131072) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclHistory.c:192
#21 0x00000001000f35bf in Tcl_MainEx (argc=-1, argv=0x7fff5fbff608, appInitProc=0x1000029f7 <Tcl_AppInit>, interp=0x100809e38) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclMain.c:518
#22 0x00000001000f3a39 in Tcl_Main (argc=1, argv=0x7fff5fbff600, appInitProc=0x1000029f7 <Tcl_AppInit>) at /Users/dkf/Documents/software/tcl8.6-commits/generic/tclMain.c:667
#23 0x00000001000029f0 in main (argc=1, argv=0x7fff5fbff600) at /Users/dkf/Documents/software/tcl8.6-commits/unix/tclAppInit.c:80
I'm guessing the issue is the need to increase the size of Tcl stack space (which appears to have hit a 32k boundary).
On the basis of that, I suspect it's not a problem with [lsort] (which now has very conventional memory handling) but rather elsewhere; as such, the suggested patch is just papering over the real problem, not fixing it. Assigning to Miguel for comment.
(and students of stack traces will see that I have added [time] to it; it appears to be a by-stander in this)
Actual crash message:
hi guard byte 0 is 0x0
hi guard byte 1 is 0x0
hi guard byte 2 is 0x0
hi guard byte 3 is 0x0
hi guard byte 4 is 0x0
hi guard byte 5 is 0x0
hi guard byte 6 is 0x0
hi guard byte 7 is 0x0
total mallocs 68005
total frees 55127
current packets allocated 12878
current bytes allocated 645210
maximum packets allocated 12885
maximum bytes allocated 655885
high guard failed at 100247038, /Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c 910
32040 bytes allocated at (/Users/dkf/Documents/software/tcl8.6-commits/generic/tclExecute.c 1110)
Memory validation failure
No 64bit platform handy to repro. Any chance to run under valgrind a -DPURIFY build? It should have symbols but no mem debug ...
Not in any config I have; my toolchain is somewhat obsolete now.