Menu

#4236 Reflected channel crash on exit

obsolete: 8.6a4
closed-fixed
9
2009-01-22
2008-12-22
Pat Thoyts
No

I've created a tclkit exectutable on windows with a modified vfs::memchan implementation that uses the 8.6 core reflected channel code instead of using tclkit's rechan command to implement memchan-type channels. This tclkit is threaded and uses the tcl core zlib command too. Firing up tkchat.kit with this tclkit results in a crash when exiting.
The stack trace:
00179310 0041a2d1 kit_gui!DeleteReflectedChannelMap+0x171 [..\generic\tclIORChan.c @ 2413]
5c 0017936c 0048cad8 kit_gui!DeleteInterpProc+0x2df [..\generic\tclBasic.c @ 1427]
18 00179384 00419fe1 kit_gui!Tcl_EventuallyFree+0xc2 [..\generic\tclPreserve.c @ 300]
14 00179398 0054b515 kit_gui!Tcl_DeleteInterp+0x5b [..\generic\tclBasic.c @ 1278]
10 001793a8 004626ed kit_gui!DeleteConsoleInterp+0x1a [..\generic\tkConsole.c @ 833]
18 001793c0 00462603 kit_gui!Tcl_FinalizeThread+0x66 [..\generic\tclEvent.c @ 1138]
10 001793d0 0046249d kit_gui!Tcl_Finalize+0xcf [..\generic\tclEvent.c @ 984]
10 001793e0 004be7cd kit_gui!Tcl_Exit+0x5b [..\generic\tclEvent.c @ 829]
In the DeleteReflectedChannelMap function the rcmPtr seems ok but the rcPtr on line 2413 is 0x00007cc which is an invalid pointer value so rcPtr->interp results in the segfault.

The refchan implementation:

if {[package vsatisfies [package provide Tcl] 8.6]} {

proc vfs::zstream {mode ifd clen ilen} {
return [zlib push $mode $ifd]
}
proc vfs::memchan {} {
set fd [chan create {read write} [namespace origin memchan_handler]]
set ::vfs::_memchan_buf($fd) ""
set ::vfs::_memchan_pos($fd) 0
return $fd
}
proc vfs::memchan_handler {cmd chan args} {
upvar 1 ::vfs::_memchan_buf($chan) buf
upvar 1 ::vfs::_memchan_pos($chan) pos
switch -exact -- $cmd {
initialize {
foreach {mode} $args break
return [list initialize finalize watch read write seek]
}
finalize {
unset buf pos
}
seek {
foreach {offset base} $args break
switch -exact -- $base {
current { incr offset $pos }
end { incr offset [string length $buf] }
}
return [set pos $offset]
}
read {
foreach {count} $args break
set r [string range $buf $pos [expr {$pos + $count - 1}]]
incr pos [string length $r]
return $r
}
write {
foreach {data} $args break
set count [string length $data]
if { $pos >= [string length $buf] } {
append buf $data
} else {
set last [expr { $pos + $count - 1 }]
set buf [string replace $buf $pos $last $data]
}
incr pos $count
return $count
}
watch {
foreach {eventspec} $args break
puts stderr "warning: someone wants to watch $chan for $eventspec"
# Not sure what to do here.
}
}
}

Discussion

  • Andreas Kupries

    Andreas Kupries - 2008-12-22

    Pat, just an ack that I have seen your report.

    Please note that I am in the last 20 hours before my flight into vacation, coming back late Jan 4. During the vacation I have no devel environment. So any work would have to be done either now, sqeezed into the holiday preps at the office which I am not sure to mabage, or after I am back.

     
  • Andreas Kupries

    Andreas Kupries - 2008-12-22

    I am having a quick look now ...

    The rcPtr is the channel instance ClientData.
    The loop we are at iterates over all channels in the rcmPtr (map of channels for the thread) and it seems we are getting a channel with a bad clientdata. Maybe already closed? But not removed from the map yet ? We are in an per-interp deletion handler. If something has changed the order finalization is done the function might be run at the wrong time. This is speculation right now.

    Related functions (same file):

    TclChanCreateObjCmd() => line 727+, entering channels into the map.

    ReflectClose() => line 1149+, Remove channel from the map.
    This should actually ensure that our deletion handlers cannot access a closed channel.

    ForwardProc() => line 2745+. A forwarded close operation, i.e. like ReflectClose(). This is used if the reflected channel was moved to a different thread. The closing has to occur in the origin thread, as that keeps all the state (tcl handler etc.)

    DeleteThreadReflectedChannelMap() => line 2480ff. Same as DeleteReflectedChannelMap, deletes the map for the whole thread (DRCM does it only for a specific interp in the thread).

    TclChanPostEventObjCmd() => line 817+. Uses only the per-interp map, not the per-thread map, and only to check that the channel is valid. No modifications of the map.

    These are the places we have to look at to debug this.

     
  • Andreas Kupries

    Andreas Kupries - 2008-12-22

    Right, the clientData itself is set in TclChanCreateObjCmd(), line 575, as part of Tcl_CreateChannel().

    It is only used by the Delete(Thread)ReflectedChannelMap() and TclChanPostEventObjCmd() functions. Which look ok.

    Could this be a memory smash bug by something else the reflected channel falls victim to ?

     
  • Andreas Kupries

    Andreas Kupries - 2008-12-22

    Now looking at the channel implementation.
    - initialize looks ok, although the [list ...] could be made a literal.
    - finalize look ok
    - seek is ok. I was confuised first, thinking that it was not handling mode 'start', it does, just very implicit in the coding.
    - read looks ok, although I am not seeing how it handles reading beyond end of the channel. The 'pos' doesn't seem to be constrained after the read is done, i.e. for a string of 10 chars I can force to 20, or whatever I like.
    - write also seems to mismanage pos when we are beyond the end of the string. It appends, yes, but it also simply increments after, so if the pos was 20 for a 10-char string, we append 5, the new pos will be 25 for a 15-char string.

    - watch ... See the examples I posted (here on SF) ... Ok, adding them here again.

     
  • Andreas Kupries

    Andreas Kupries - 2008-12-22

    Examples for reflected channels

     
  • Andreas Kupries

    Andreas Kupries - 2008-12-22

    Remembering that you said that the new zlib stuff is used as well ... does the crash go away if you use either the old zlib stuff, or no zlib (= uncompressed vfs) at all ?

    With the example channels now, which include my own memchan as well, does the crash happen for these too ?

    I.e. now trying to reduce the amount of code needed to repro the crash, especially trying to see if vfs is necessary to repro.

     
  • Nobody/Anonymous

    I've tried reducing the implementation causing this and so far it seems to require Tk to be in use. It seems like I always have one of these channels open even though there is no channel registered for the file. As it is a gif I looked at the [image create photo] code but it looks like the channel should be closed there to me.
    So far the smallest bug reproducing script I have got is to create a vfs folder with a gif image and the following main.tcl. Then make a starkit and tclkit-gui test.kit, close the window and it crashes. I have used zipkit and metakit and both crash so it is not dependent on the vfs implementation. [rechan] is ok - presumably a Memchan [memchan] would be fine too.

    package require Tk
    package require starkit
    if {[starkit::startup] ne "sourced"} {
    set img [image create photo -file [file join $starkit::topdir tkchat48.gif]]

    wm iconphoto . $img
    tkwait window .
    #exit
    }

    Uncommenting exit seems to resolve the problem - so I think it is an interaction with Tk finalization and Tcl finalization.

    In tkchat we have the whiteboard which is implemented using safetk and a safe interp. When I close that using this tclkit it crashes as well in the same place when destroying this safe interpreter.

     
  • Pat Thoyts

    Pat Thoyts - 2008-12-23

    tcl script demonstrating the crash

     
  • Pat Thoyts

    Pat Thoyts - 2008-12-23

    I've added a single file that demos the crash from wish. This contains its own [memchan] and a vfs so we can create an image. The key seems to be the use of image creation in Tk from such channels via a vfs. Once the image is created the channel seems to be closed -- it is no longer listed in [chan names] however our finalize method was never called.
    Seems like Tk can cause a dodgy channel close leaving junk behind for the interp finalization?
    This script can be run with a 8.6b1 wish with tclvfs available - it doesnt need a tclkit.

     
  • Andreas Kupries

    Andreas Kupries - 2009-01-05

    Thanks for the continued investigation. I will start working on this in the next days, as I ramp myself up after the christmas vacation.

     
  • Nobody/Anonymous

    Confirmed the crash. Linux.
    Using Tcl/Tk 8.6 CVS Head, and TclVFS CVS Head.

    Ok, first results ... Tk calls 'close' or something similar for the image, and 'ReflectClose' is called with 'interp == NULL'. Which it interprets, per the comments in the code as 'I am called from TclFinalizeIOSystem'. Consistent with that interpretation it assumes that there is no Tcl level it can call, so refrains from invoking the "finalize" method.

    Next step, why is interp == NULL.

     
  • Nobody/Anonymous

    #8 0x401f629c in Tcl_Close (interp=0x0, chan=0x80b6160) at /home/andreask/workbench/VvVFS/src/tcl86/generic/tclIO.c:3084
    #9 0x400c33e0 in ImgPhotoConfigureMaster (interp=0x804c0b0, masterPtr=0x80eca40, objc=2, objv=0x804d5ac, flags=0)
    at /home/andreask/workbench/VvVFS/src/tk86/generic/tkImgPhoto.c:1845

    Well, as thought, the NULL pointer originates in Tk. Now to see if that makes sense for Tk itself.

     
  • Andreas Kupries

    Andreas Kupries - 2009-01-21

    Stupid thing. I was logged in. :( Last two comments are mine.

     
  • Andreas Kupries

    Andreas Kupries - 2009-01-21

    And changing all the Tcl_Close (NULL, chan) in ImgPhotoConfigureMaster () fixes the crash.
    Now

     
  • Andreas Kupries

    Andreas Kupries - 2009-01-21

    Ah .. submitted to early.

    While the change does indeed fix our problem it is not clear yet if that is actually the correct thing to do.

    First, the reflections (transform too), currently assume that 'interp == NULL' means 'TclFinalizeIOSystem'. And indeed that is the only place in the core where interp == NULL is used.

    Reading the OpenFileChnl.3 manpage the description of Tcl_Close contains

    If the channel is being closed synchronously and an error occurs during
    closing of the channel and interp is not NULL, an error message is left in
    the interpreter's result.

    This seems to imply that Tcl_Close is allowed to be called with interp == NULL from the outside, i.e. packages, and that this just means 'I do not wish to see error messages'.
    The comments in the code (i.e. tclIO.c, definition of Tcl_Close()), however do not mention that the interp argument is allowed to be NULL.

    If we assume that the manpage has priority then the correct fix is not to change Tk, but the implementation of the reflections, i.e. use a different way of detecting when ReflectClose() is called from the TclFinalizeIOSystem().

    Maybe the channel flags allow us to tell this ?
    Hm. TclInThreadExit() seems to be closest.

     
  • Andreas Kupries

    Andreas Kupries - 2009-01-22
    • status: open --> closed-fixed
     
  • Andreas Kupries

    Andreas Kupries - 2009-01-22

    Fix in the reflection implementations committed to head and 8.5 head.

    Now using TclInThreadExit() for the test. Also modified a place where the interp was accessed directly to pull it out of the reflection channel/transform structure instead. Passes testsuite, and the zcrash.tcl test application attached here.