Tcl / Read-Only Bugs / #4525 Unset PATTERN unsafe with trace-originated secondary unsets

Stephen Huntley - 2010-01-25

starkit containing example code

echoevent.kit

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-01-25

Since this happens on unix, can you enable core dumps, and provide a gdb stack trace of the crash ? I'd like to dichotomize between Metakit and the vanilla Tcl core.

As a side note, a .kit is not the easiest transport for some maintainers (me at least), because it requires additional effort to crack the hull open, and reproduce with the core-under-development. I hate efforts ;-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steve Landers - 2010-01-25

I've reproduced the problem on OSX 10.6.2 using wish 8.5.7. If I can figure how, I'll attach the traceback file

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-01-26

gdb trace under Fedora 12 32bits

crash1.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-01-26

After Steve kindly brought all the pieces beside my bed, I could get the attached gdb stack trace under Linux (Fedora 12, 32bits). Nothing about Metakit there.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-01-30

Further analysis of the core shows the command

array unset collab .c,queue,42,*

being executed at the time. So it seems to hint at an interaction between patterned unsets and traces. I'd appreciate somebody knowing the app at hand to trim that stuff down to a manageable size... TBC

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-01-30

Homing in.
Just adding [puts] around the unset, shows a reentrancy issue.

UNSETCOLLAB>: .c,queue,19,* (size:23 -- matching:10)
UNSETCOLLAB>: .c,queue,20,* (size:22 -- matching:10)
UNSETCOLLAB<
UNSETCOLLAB>: .c,queue,19,* (size:12 -- matching:9)
UNSETCOLLAB<

This in turn can be traced down to Tequila doing [vwait] in its guts. With Tk and a heavy stream of events, that [vwait] can catch a GUI event and call arbitrary code, including some doing an extra [unset] on the same var.

Knowing this, I'll try to build an atomic example.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-01-30

Still not managed to reproduce, but a good candidate from RTFS:
At line 3199 of tclVar.c, for [array unset var pattern], a loop iterates over key with a stateful iterator:

for (varPtr2=VarHashFirstVar(varPtr->value.tablePtr, &search);
varPtr2!=NULL ; varPtr2=VarHashNextVar(&search)) {

then, in the body of that loop, when a match is found, the unset is done, possibly calling traces, possibly wreaking the chaining of hash keys.

Some kind of locking, or a more robust iterator, must be used I think.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-01-30

Additional insight: the way the "stateful iterator" Tcl_HashSearch work, is by a {bucket index, chained intra bucket pointer} pair. So the cases where the aforementioned interaction can cause harm, is when the second-level [unset] removes the very next entry from the same bucket.

That's why the crash is so rare. Now to designing a pin-sharp case ;-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-01-31

Sigh.
Just produced a bucket of depth three: [array set x {a 1 e 1 i 1}].
(verify with [array stat x])

Then tried this:
trace add variable x unset rem
proc rem args {puts REM:$args; if {![info exists ::again]} {set ::again 1;unset ::x(e)}}

Then [array unset x *] should remove, in order, i,e,a.
*But*, since the trace proc removes e while removing i, the outer loop should get into trouble....

Though it doesn't for some reason.
Just noticed the tricky game of Undefining a var instead of coldly killing its hash entry. Maybe this explains that, and my hypothesis was naive. Something more subtle must come into play...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-01

Ha ! Got it now.

In truth the secondary unset _does_ kill the hash entry in that case, it's just that the reuse of the freed block doesn't cause immediate harm normally. Hence I had to add the following 'sanitizing' line to FreeVarEntry:

if (TclIsVarUndefined(varPtr) && !TclIsVarTraced(varPtr)
&& (VarHashRefCount(varPtr) == 1)) {
+ varPtr->value.objPtr=(void *)0xdeadbeef;
ckfree((char *) varPtr);

It is simply a matter of resetting a field to a visibly invalid value just before freeing.
The effect is that, as soon as somebody reuses that freed structure (and typically by indirections off this objPtr field), a SIGSEGV or SIGBUS will burst on 0xdeadbeef.

And it does: given this sanitizing patch, the following ultrashort example crashes tclsh:

array set x {e 1 i 1}
trace add variable x unset rem
proc rem args {
puts REM:$args
if {![info exists ::already]} {
set ::already 1
unset ::x(e)
}
}
array unset x *

Explanation: a nascent array with a small size has only 4 buckets, allowing a quick search of collisions.
It turns out 'e' and 'i' fall in the same bucket. Since the entries of a same bucket are stored in a LIFO linked list, the [array unset x *] iteration will do 'i', then 'e'. The rest of the code arranges for the 'e' to be removed as a side effect of the unset trace on 'i', hence removing the rug. Boom.

Renaming the bug according to this confirmation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-01

summary: IPC via sockets and traces causes segfault --> Unset PATTERN unsafe with trace-originated secondary unsets
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-01

priority: 5 --> 8
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-01

Upping prio. Patch under way.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-01

The attached patch fixes the issue by properly protecting the next-in-current-bucket.
This is done modularly (without touching tclHash.c) thanks to the property that the search object has a NULL nextEntryPtr at each bucket change. So the protection is done by upping the refcount of nextEntryPtr, and decreasing it again one loop cycle later.

Since the patch includes the "sanitizing" line, a deterministic test case will be provided shortly.

Please review.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-01

status: open --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Donal K. Fellows - 2010-02-02

That patch doesn't quite work. It has excess refcount problems when there's a failure to unset midway. Evil case!

I'll attach the patch I've applied to the HEAD, which addresses the issue (it's closely based on yours and includes a test). Note that it doesn't include any code to invalidate unused memory; the --enable-symbols=mem configuration works fine for that already.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Donal K. Fellows - 2010-02-02

Updated patch

arrayunset.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Donal K. Fellows - 2010-02-02

Backported to 8.5. I think it's probably not practical to backport to 8.4 since this is thoroughly dependent on miguel's VarReform work.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Donal K. Fellows - 2010-02-02

labels: 104240 --> 07. Variables

assigned_to: dgp --> dkf

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-02

status: closed-fixed --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-02

Beware, there is a leak in the non-error case now. This is due to only decrementing refcounts without ever calling CleanupVar. Not sure why you removed this call from my initial patch.

Proof of leak:

array set x {e 1 i 1}
trace add variable x unset rem
proc rem args {
puts REM:$args
if {![info exists ::already]} {
set ::already 1
unset ::x(e)
}
}
array unset x i*
array stat x

=>
number of buckets with 0 entries: 3
number of buckets with 1 entries: 1

array size x
=> 0

What happens is that the secondary unset is delayed because "e" is in the protected state, but since the overall pattern "i*" doesn't include "e", it doesn't get directly deleted from the main iteration either.

Having noticed that VarHashRefCount(x)-- doesn't take the actual final step, we must call CleanupVar independently from the match outcome; this is exactly what my patch was doing.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-02

Attached the small delta-patch reintroducing CleanupVar and eliminating the leak.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2010-02-02

Reintroduce CleanupVar

arrayunset2.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Donal K. Fellows - 2010-02-02

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Unset PATTERN unsafe with trace-originated secondary unsets

The Tool Command Language implementation

Group

Searches

Help

#4525 Unset PATTERN unsafe with trace-originated secondary unsets

Discussion