#4682 crash when freeing locals hashtable

obsolete: 8.5.8
closed-fixed
9
2010-09-29
2010-07-31
No

The one-liner
proc foo {} { [catch {upvar 0 dummy \$index}] } ; foo
accesses freed memory and causes a crash (in non-threaded tcl, threaded seems to survive due to allocator differences).

Valgrind shows that a freed hash entry is being accessed, in order to free it again. The problem seems to depend on the order in which TclDeleteVars traverses the hashtable of locals.

Note that this likely impacts 8.6 too (unverified as of yet, time is so short and this netbook so slow).

Discussion

  • miguel sofer

    miguel sofer - 2010-07-31

    OOPS ... forgot to credit tclguy and aku for detecting the problem, and reducing it to this oneliner.

     
  • miguel sofer

    miguel sofer - 2010-07-31

    Description of the early-freeing scenario:
    * two locals a and b defined at runtime - ie, in the hashtable and not the locals array
    * upvar 0 a b
    * the ONLY reference to a is by the upvar, and it remains undefined
    * the hashtable traversal reaches b before a

    In this case, when b is freed a is eliminated from the hashtable and is freed too: there are only two refcounts to this var, one from the hash entry and one from the upvar. When the upvar goes away, the undef'ed var is seen as being kept alive only by the entry, and the lot is freed to eliminate the "circular ref".

    Later, when Tcl_NextHashEntry tries to read a's entry we access freed memory.

     
  • miguel sofer

    miguel sofer - 2010-07-31

    Supporting evidence for the diagnose:

    % proc foo {} { [catch {upvar 0 dummy \$index; set dummy 1}]}; foo
    invalid command name "0"
    % proc foo {} { [catch {upvar 0 dummy \$index; set x 1}]}; foo
    Segmentation fault

     
  • miguel sofer

    miguel sofer - 2010-07-31

    Two possible solutions:

    (1) lose the optimisation, delete the vars from the table one by one (just like in TclDeleteNamespaceVars; can also merge big parts of these two funcs)
    (2) somehow signal to UnsetVarStruct or CleanupVar that the var should NOT be deleted.

    (2) seems hacky to me, and the value of the opt is iffy anyway (in my prejudice): it is low impact, and only concerns procs with runtime-created locals

     
  • miguel sofer

    miguel sofer - 2010-07-31

    The attached patch implements (1) and solves the issue. Not committed pending full valgrind check.

     
  • miguel sofer

    miguel sofer - 2010-07-31
     
  • miguel sofer

    miguel sofer - 2010-07-31

    Patch committed to core-8-5-branch and (adapted) HEAD

     
  • miguel sofer

    miguel sofer - 2010-07-31
    • status: open --> closed-fixed
     
  • Cyan Ogilvie

    Cyan Ogilvie - 2010-08-02

    Testing the fix on HEAD of 8.6, I get an abort "maformed bucket chain in Tcl_DeleteHashEntry" at tclHash.c:434. Strangely, the build from the 24th of June is fine (and valgrind doesn't report any problems).

    Tested configuration: Tcl HEAD from 2010-07-31 22:28:02, Linux 64 bit, glibc 2.11, threaded build, enabled symbols. Backtrace:

    #0 0x00007ffff71d1a75 in *__GI_raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
    #1 0x00007ffff71d55c0 in *__GI_abort () at abort.c:92
    #2 0x000000000043c74b in Tcl_PanicVA (format=0x578ca8 "malformed bucket chain in Tcl_DeleteHashEntry", argList=0x7fffffffd710)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclPanic.c:93
    #3 0x000000000043c810 in Tcl_Panic (format=0x578ca8 "malformed bucket chain in Tcl_DeleteHashEntry")
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclPanic.c:122
    #4 0x0000000000546452 in Tcl_DeleteHashEntry (entryPtr=0x847bb8)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclHash.c:434
    #5 0x000000000046b218 in TclDeleteVars (iPtr=0x7bc550, tablePtr=0x847b20)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclVar.c:5335
    #6 0x0000000000431b6c in Tcl_PopCallFrame (interp=0x7bc550)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclNamesp.c:369
    #7 0x000000000044824f in InterpProcNR2 (data=0x7bc508, interp=0x7bc550, result=1) at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclProc.c:1911
    #8 0x00000000004a52a2 in TclNRRunCallbacks (interp=0x7bc550, result=1, rootPtr=0x82cdd0, tebcCall=1)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclBasic.c:4324
    #9 0x000000000053b8c7 in TclExecuteByteCode (interp=0x7bc550, codePtr=0x814250)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclExecute.c:6596
    #10 0x00000000004a55ab in NRCallTEBC (data=0x7ba498, interp=0x7bc550, result=0)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclBasic.c:4400
    #11 0x00000000004a52a2 in TclNRRunCallbacks (interp=0x7bc550, result=0, rootPtr=0x0, tebcCall=0)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclBasic.c:4324
    #12 0x00000000004a7da7 in TclEvalObjEx (interp=0x7bc550, objPtr=0x0, flags=131072, invoker=0x0, word=0)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclBasic.c:5941
    #13 0x00000000004a7d32 in Tcl_EvalObjEx (interp=0x7bc550, objPtr=0x0, flags=131072)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclBasic.c:5922
    #14 0x0000000000547047 in Tcl_RecordAndEvalObj (interp=0x7bc550, cmdPtr=0x7ba4f0, flags=131072)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclHistory.c:192
    #15 0x0000000000430e95 in Tcl_Main (argc=-1, argv=0x7fffffffe4e0, appInitProc=0x411882 <Tcl_AppInit>)
    at /home/cyan/git/cfkit/pkgsrc/tcl/generic/tclMain.c:471
    #16 0x000000000041187b in main (argc=1, argv=0x7fffffffe4d8)
    at /home/cyan/git/cfkit/pkgsrc/tcl/unix/tclAppInit.c:85

    Tcl script:

    proc foo {} {
    [catch {upvar 0 dummy \$index}]
    }

    foo

    Valgrind doesn't complain about any errors

     
  • miguel sofer

    miguel sofer - 2010-08-02

    Ouch, bad port of the patch - my bad, should get more sleep. Thx for spotting this! Should be fixed in HEAD now.

     
  • miguel sofer

    miguel sofer - 2010-08-02
    • status: closed-fixed --> pending-fixed
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2010-08-02

    BTW, to make the test easier, you don't need the [] around catch - it will still crash (at least for me). Don't we want that in the test suite?

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2010-08-03

    var.test suite updated in 8.5 and head.

     
  • SourceForge Robot

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
  • SourceForge Robot

    • status: pending-fixed --> closed-fixed