From: Tim E. <ti...@op...> - 2013-08-13 16:03:55
|
Hello Jan, > Requesting feedback on my proposed solution > <http://core.tcl.tk/tk/info/aa61bf42e0> > this changes the OptionTable cache to be stored per thread in > stead of per interpreter, and it has the side-effect that deleting > the interpreter does not automatically delete the OptionTable > information any more: that will only be done when > Tk_DeleteOptionTable() is called the same number of times > as Tk_CreateOptionTable() AND all "option" Tcl_Obj's are > cleaned up. I think this is the correct approach, which should > have been followed long ago, but if anyone objects to > such a quite radical change, please let me know before > I integrate this on trunk. > > @tim: Please checkout my proposed solution. > Does this fix your problem in xcircuit when linking with Tcl 8.6.0? I tried out the revised code. The xcircuit code now breaks at a different point when linking to 8.6.0, but in a very similar situation. This is in a place in the code where I am driving the interpreter from inside the C code, so there is definitely room for error on my side. Here is the backtrace from the crash using the code base from your URL: -------------------------------------------------------------------------- #0 0x0000003b19636285 in raise () from /lib64/libc.so.6 #1 0x0000003b19637b9b in abort () from /lib64/libc.so.6 #2 0x00007ffff79ea00f in Tcl_PanicVA ( format=0x7ffff7a66da8 "malformed bucket chain in Tcl_DeleteHashEntry", argList=0x7fffffffd2a8) at /usr/local/src/tcl8.6.0/generic/tclPanic.c:119 #3 0x00007ffff79ea0ae in Tcl_Panic ( format=0x7ffff7a66da8 "malformed bucket chain in Tcl_DeleteHashEntry") at /usr/local/src/tcl8.6.0/generic/tclPanic.c:149 #4 0x00007ffff79af00d in Tcl_DeleteHashEntry (entryPtr=0x9644d0) at /usr/local/src/tcl8.6.0/generic/tclHash.c:454 #5 0x00007ffff7ceaf86 in Tk_DeleteOptionTable () from /usr/local/lib/libtk8.6.so #6 0x00007ffff7ceafbd in FreeOptionInternalRep () from /usr/local/lib/libtk8.6.so #7 0x00007ffff7ce9b4d in GetOptionFromObj () from /usr/local/lib/libtk8.6.so #8 0x00007ffff7ceb5e0 in Tk_SetOptions () from /usr/local/lib/libtk8.6.so #9 0x00007ffff7d0e745 in ConfigureButton () from /usr/local/lib/libtk8.6.so #10 0x00007ffff7d0f1c5 in ButtonCreate () from /usr/local/lib/libtk8.6.so #11 0x00007ffff78e8ec6 in TclNREvalObjv (interp=0x61ad90, objc=6, objv=0x95c730, flags=0, cmdPtr=0x6a7520) at /usr/local/src/tcl8.6.0/generic/tclBasic.c:4308 #12 0x00007ffff78e88ff in Tcl_EvalObjv (interp=0x61ad90, objc=6, objv=0x95c730, flags=0) at /usr/local/src/tcl8.6.0/generic/tclBasic.c:4141 #13 0x00007ffff6c36aa4 in xctcl_label (clientData=0x659090, interp=0x61ad90, objc=6, objv=0x61f3b0) at tclxcircuit.c:4073 ------------------------------------------------------------------------ Looking at backtrace stack position #7: #7 0x00007ffff7c963f9 in GetOptionFromObj (interp=0x61ad90, objPtr=0x918610, tablePtr=0x87f1e0) at /usr/local/src/tk8.6.0/unix/../generic/tkConfig.c:1090 1090 objPtr->typePtr->freeIntRepProc(objPtr); The value of "objPtr" is: (gdb) print *objPtr $3 = {refCount = 1, bytes = 0x9618f0 "-bg", length = 3, typePtr = 0x7ffff7fde860, internalRep = {longValue = 9905552, doubleValue = 4.8939929462940514e-317, otherValuePtr = 0x972590, wideValue = 9905552, twoPtrValue = {ptr1 = 0x972590, ptr2 = 0x9726a0}, ptrAndLongRep = {ptr = 0x972590, value = 9905824}}} which seems normal enough. But going down to stack position #6: #6 0x00007ffff7c9652c in FreeOptionInternalRep (objPtr=0x918610) at /usr/local/src/tk8.6.0/unix/../generic/tkConfig.c:1170 1170 Tk_DeleteOptionTable(tablePtr); and down one more to where "tablePtr" is cast to a type I can look at: #5 0x00007ffff7c95230 in Tk_DeleteOptionTable (optionTable=0x972590) at /usr/local/src/tk8.6.0/unix/../generic/tkConfig.c:354 354 Tcl_DeleteHashEntry(tablePtr->hashEntryPtr); The value of "tablePtr" is: (gdb) print *tablePtr $6 = {refCount = -1, hashEntryPtr = 0x962650, nextPtr = 0x0, numOptions = 41, options = {{specPtr = 0x7ffff7fedea0, dbNameUID = 0x6a89c0 "activeBackground", dbClassUID = 0x671040 "Foreground", defaultPtr = 0x897510, extra = {monoColorPtr = 0x8976c0, synonymPtr = 0x8976c0, custom = 0x8976c0}, flags = 1}}} I am not an expert on the Tk internal structures, but I'm guessing that a refCount of -1 is not right. . . This is the code in xcircuit that is a the top of the stack trace I listed above: ------------------------------------------------------------------------ Tcl_Obj **newobjv = (Tcl_Obj **)Tcl_Alloc(objc * sizeof(Tcl_Obj *)); newobjv[0] = Tcl_NewStringObj("tcl_label", 9); Tcl_IncrRefCount(newobjv[0]); for (i = 1; i < objc; i++) { newobjv[i] = Tcl_DuplicateObj(objv[i]); Tcl_IncrRefCount(newobjv[i]); } result = Tcl_EvalObjv(interp, objc, newobjv, 0); ----------------------------------------------------------------------- This bit of code is designed to "overload" the command "label". So xcircuit has a command called "label", and Tk has a command called "label". Although at the beginning of xcircuit, the Tk command is renamed "tcl_label", I noted that the two uses had completely different syntax, so I wanted to allow mixed use of the two "label" commands. So the code above is part of a routine that is the callback for the command "label". It checks the number of arguments and so forth to disambiguate the two uses. If it finds that the Tk "label" command was intended, then it creates a new command by (1) creating a new string object containing the new name "tcl_label" of the original command, and (2) making a copy of the rest of the command using Tcl_DuplicateObj. This is all to avoid directly changing the original objv[] vector containing the command. The new command line created by the code above is: ----------------------------------------------------------------------- tcl_label .output.textent.lab4 -text Width: -bg beige ----------------------------------------------------------------------- Previously, the code broke on a statement at line 815 of my "wrapper.tcl" script. Now it breaks at line 885. In both cases it's a widget option "-bg beige" that causes the crash, and both crashes happen in the middle of an unremarkable set of similar statements. But it is possible that my use of Tcl_IncrRefCount(), Tcl_DuplicateObj(), and Tcl_EvalObjv() to create and evaluate a new command on the fly is just wrong, and causing a memory allocation problem that shows up randomly at a later time. To check this possibility, I replaced all "label" commands in my script with "tcl_label", and the crash does not occur. So I feel that I have done something wrong by creating and evaluating a new command-line command on the fly, but I cannot see what. In the worst case, I do have a work-around for the error. Regards, Tim +--------------------------------+-------------------------------------+ | R. Timothy Edwards (Tim) | email: ti...@op... | | Open Circuit Design | web: http://opencircuitdesign.com | | 22815 Timber Creek Lane | phone: (301) 528-5030 | | Clarksburg, MD 20871-4001 | cell: (240) 401-0616 | +--------------------------------+-------------------------------------+ |