OS: Solaris 5.8.
Removing trace on second level of recursive call can
corrupt or substitute wrong number of compiled
variables for first level of recursion. Then, on
return, TclLookupSimpleVar() will have two different
values of number of compiled variables in
procPtr->numCompiledLocals (5)
and
varFramePtr->numCompiledLocals (3)
at tclVar.c, line 818.
Then loop for all variables (line 823) will use bigger
value, procPtr->numCompiledLocals, and it will crash on
accessing varFramePtr->compiledLocals out of bound.
Please use attached testcase for reproducing the bug.
Tcl and C code to crash
Logged In: YES
user_id=80530
looks like it's still there in 8.4.13.
Upping priority to see if it can
get resolved before 8.4.14.
Logged In: YES
user_id=80530
Traces are probably involved,
but the root cause of the
crash appears to be corruption
of a Var struct, so let's bring
the Var expert in on this.
Miguel, the crash happens at
line 828 of tclVar.c where
localVarPtr->name has a bogus
value, even though
localVarPtr->refCount is 19.
Any hints?
Logged In: YES
user_id=80530
ok, as original report already
observed, root cause is that
procPtr->numCompiledLocals
and varFramePtr->numCompiledLocals
do not agree.
Since the for loop at line 825
fundamentally loops over
varFramePtr->compiledLocals, it
would make more sense to use
varFramePtr->numCompiledLocals
to set the loop limit. Simply
making that change appears
to fix this bug.
But leaves several questions
raised. Why do we need to store
the same info in two places?
And once we've done that, just what
corruption is forcing them
to get out of sync?
Logged In: YES
user_id=80530
oops! the loop traverses
data from both structs,
so the two numCompiledLocals
simply must agree.
miguel reports:
miguel the CallFrame's is set from the Proc's at tclProc.c
line 979. Should not be modified, ever
Logged In: YES
user_id=80530
I added a test that the two
numCompiledLocals values were
the same, and a Tcl_Panic()
when they are not. This
let me detect the problem directly
without needing lots of other
things "just so" to force a
segfault.
This let me simplify the demo
script to:
load libtrace.so
proc Console:Eval {cmd} {
set enterx 0
set ret [uplevel #0 $cmd]
if {$enterx} {}
return $ret
}
Console:Eval {
closeTrace
Console:Eval {}
}
The value mismatch
happens because [closeTrace]
calls Tcl_DeleteTrace()
which advances the bytecode
compile epoch (line 5208
of tclBasic.c).
This means the nested
[Console:Eval] call is
a different compile from
the outer one, and the two
compiles are done under
different settings of the
DONT_COMPILE_CMDS_INLINE
flag. The different compile
rules may well explain the
different number of compiledLocals.
The remaining bug here appears
to be that the CallFrame of
the outer call now has a procPtr
field pointing to a Proc corresponding
to the re-compile?
Either that should not happen, or
if it's ok for that to happen, then
the TclLookupSimpleVar needs a rewrite
to not assume the CallFrame and Proc
agree about their compiledLocals lists.
Logged In: YES
user_id=80530
Based on a plan laid out
by miguel, I have a fix
for this in my development
sources. I'll post a patch
for testing after access to
CVS is restored.
Logged In: YES
user_id=80530
Here's the patch against
8.4.13.
Note the patch leaves the
bug unfixed when the routine
getting recompiled is due to
a call to TclProcCompileProc()
from an extension intruding on
internals.
Logged In: YES
user_id=80530
Patch is flawed; examining...
Logged In: YES
user_id=80530
Here's the corrected patch
for 8.4.13.
Logged In: YES
user_id=80530
And the corresponding patch
for 8.5a4
Logged In: YES
user_id=80530
fixed for 8.4.14 and 8.5a5
Logged In: YES
user_id=148712
Originator: NO
Removed the fix in HEAD: this is not needed at least since varReform, where the local variable data at runtime is read from the CallFrame and/or the LocalCache.
Reopened to consider this for backporting to 8.5, although my inclination is to leave the stable branch alone: this is removing unneeded code, not a bugfix.
Logged In: YES
user_id=1312539
Originator: NO
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).