Menu

#3427 crash after removing trace in recursive call

obsolete: 8.4.14
closed-fixed
9
2008-08-29
2006-05-05
No

OS: Solaris 5.8.
Removing trace on second level of recursive call can
corrupt or substitute wrong number of compiled
variables for first level of recursion. Then, on
return, TclLookupSimpleVar() will have two different
values of number of compiled variables in
procPtr->numCompiledLocals (5)
and
varFramePtr->numCompiledLocals (3)
at tclVar.c, line 818.
Then loop for all variables (line 823) will use bigger
value, procPtr->numCompiledLocals, and it will crash on
accessing varFramePtr->compiledLocals out of bound.
Please use attached testcase for reproducing the bug.

Discussion

1 2 > >> (Page 1 of 2)
  • Yevgen Ryazanov

    Yevgen Ryazanov - 2006-05-05

    Tcl and C code to crash

     
  • Don Porter

    Don Porter - 2006-05-06

    Logged In: YES
    user_id=80530

    looks like it's still there in 8.4.13.

    Upping priority to see if it can
    get resolved before 8.4.14.

     
  • Don Porter

    Don Porter - 2006-05-06
    • milestone: 503285 --> obsolete: 8.4.14
    • priority: 5 --> 9
     
  • Don Porter

    Don Porter - 2006-05-08
    • labels: 104240 --> 104254
     
  • Don Porter

    Don Porter - 2006-05-08
    • assigned_to: dgp --> msofer
     
  • Don Porter

    Don Porter - 2006-05-08

    Logged In: YES
    user_id=80530

    Traces are probably involved,
    but the root cause of the
    crash appears to be corruption
    of a Var struct, so let's bring
    the Var expert in on this.

    Miguel, the crash happens at
    line 828 of tclVar.c where
    localVarPtr->name has a bogus
    value, even though
    localVarPtr->refCount is 19.

    Any hints?

     
  • Don Porter

    Don Porter - 2006-05-08

    Logged In: YES
    user_id=80530

    ok, as original report already
    observed, root cause is that
    procPtr->numCompiledLocals
    and varFramePtr->numCompiledLocals
    do not agree.

    Since the for loop at line 825
    fundamentally loops over
    varFramePtr->compiledLocals, it
    would make more sense to use
    varFramePtr->numCompiledLocals
    to set the loop limit. Simply
    making that change appears
    to fix this bug.

    But leaves several questions
    raised. Why do we need to store
    the same info in two places?
    And once we've done that, just what
    corruption is forcing them
    to get out of sync?

     
  • Don Porter

    Don Porter - 2006-05-08

    Logged In: YES
    user_id=80530

    oops! the loop traverses
    data from both structs,
    so the two numCompiledLocals
    simply must agree.

    miguel reports:
    miguel the CallFrame's is set from the Proc's at tclProc.c
    line 979. Should not be modified, ever

     
  • Don Porter

    Don Porter - 2006-05-09

    Logged In: YES
    user_id=80530

    I added a test that the two
    numCompiledLocals values were
    the same, and a Tcl_Panic()
    when they are not. This
    let me detect the problem directly
    without needing lots of other
    things "just so" to force a
    segfault.

    This let me simplify the demo
    script to:

    load libtrace.so
    proc Console:Eval {cmd} {
    set enterx 0
    set ret [uplevel #0 $cmd]
    if {$enterx} {}
    return $ret
    }
    Console:Eval {
    closeTrace
    Console:Eval {}
    }

    The value mismatch
    happens because [closeTrace]
    calls Tcl_DeleteTrace()
    which advances the bytecode
    compile epoch (line 5208
    of tclBasic.c).

    This means the nested
    [Console:Eval] call is
    a different compile from
    the outer one, and the two
    compiles are done under
    different settings of the
    DONT_COMPILE_CMDS_INLINE
    flag. The different compile
    rules may well explain the
    different number of compiledLocals.

    The remaining bug here appears
    to be that the CallFrame of
    the outer call now has a procPtr
    field pointing to a Proc corresponding
    to the re-compile?

    Either that should not happen, or
    if it's ok for that to happen, then
    the TclLookupSimpleVar needs a rewrite
    to not assume the CallFrame and Proc
    agree about their compiledLocals lists.

     
  • Don Porter

    Don Porter - 2006-05-10

    Logged In: YES
    user_id=80530

    Based on a plan laid out
    by miguel, I have a fix
    for this in my development
    sources. I'll post a patch
    for testing after access to
    CVS is restored.

     
  • Don Porter

    Don Porter - 2006-05-10
    • assigned_to: msofer --> dgp
    • labels: 104254 --> 22. [proc] and [uplevel]
     
  • Don Porter

    Don Porter - 2006-05-10

    Logged In: YES
    user_id=80530

    Here's the patch against
    8.4.13.

    Note the patch leaves the
    bug unfixed when the routine
    getting recompiled is due to
    a call to TclProcCompileProc()
    from an extension intruding on
    internals.

     
  • Don Porter

    Don Porter - 2006-05-10

    Logged In: YES
    user_id=80530

    Patch is flawed; examining...

     
  • Don Porter

    Don Porter - 2006-05-10

    Logged In: YES
    user_id=80530

    Here's the corrected patch
    for 8.4.13.

     
  • Don Porter

    Don Porter - 2006-05-10
     
  • Don Porter

    Don Porter - 2006-05-10

    Logged In: YES
    user_id=80530

    And the corresponding patch
    for 8.5a4

     
  • Don Porter

    Don Porter - 2006-05-10
     
  • Don Porter

    Don Porter - 2006-05-13
    • status: open --> closed
     
  • Don Porter

    Don Porter - 2006-05-13

    Logged In: YES
    user_id=80530

    fixed for 8.4.14 and 8.5a5

     
  • Don Porter

    Don Porter - 2006-05-13
    • status: closed --> closed-fixed
     
  • miguel sofer

    miguel sofer - 2008-08-10
    • status: closed-fixed --> open-fixed
     
  • miguel sofer

    miguel sofer - 2008-08-10

    Logged In: YES
    user_id=148712
    Originator: NO

    Removed the fix in HEAD: this is not needed at least since varReform, where the local variable data at runtime is read from the CallFrame and/or the LocalCache.

    Reopened to consider this for backporting to 8.5, although my inclination is to leave the stable branch alone: this is removing unneeded code, not a bugfix.

     
  • Don Porter

    Don Porter - 2008-08-13
    • assigned_to: dgp --> msofer
     
  • miguel sofer

    miguel sofer - 2008-08-13
    • status: open-fixed --> pending-fixed
     
  • SourceForge Robot

    Logged In: YES
    user_id=1312539
    Originator: NO

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
1 2 > >> (Page 1 of 2)
MongoDB Logo MongoDB