#4677 Crash in UpdateStringOfList()

obsolete: 8.6b1.1
closed-fixed
9
2010-08-04
2010-07-26
Emiliano
No

When trying to run the code on http://wiki.tcl.tk/26735 with Tcl HEAD
(updated today), using the wish8.6 interpreter, I can crash the program
by closing the wish window with the mouse.

$ gdb wish8.6
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-slackware-linux"...
(gdb) run memchan_2.tcl
Starting program: /usr/local/bin/wish8.6 memchan_2.tcl
[Thread debugging using libthread_db enabled]
[New Thread 0xb7a25ab0 (LWP 555)]
[New Thread 0xb79eeb90 (LWP 558)]
Writing data into rc0
Reading data from rc1
**Hello World**
Writing data into rc1
Reading data from rc0
**Hello There !!** <=== Close the "." wish window here

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7a25ab0 (LWP 555)]
0xb7eacbed in UpdateStringOfList (listPtr=0x8072ab8)
at /home/emiliano/src/tcl/generic/tclListObj.c:1865
1865 {
(gdb) bt
#0 0xb7eacbed in UpdateStringOfList (listPtr=0x8072ab8)
at /home/emiliano/src/tcl/generic/tclListObj.c:1865

#1 0xb7eb784f in Tcl_GetString (objPtr=0x8072ab8)
at /home/emiliano/src/tcl/generic/tclObj.c:1636
#2 0xb7eb78d1 in Tcl_GetStringFromObj (objPtr=0x8072ab8, lengthPtr=0xbf2930d4)
at /home/emiliano/src/tcl/generic/tclObj.c:1676
#3 0xb7eacc9e in UpdateStringOfList (listPtr=0x8072ab8)
at /home/emiliano/src/tcl/generic/tclListObj.c:1893

#4 0xb7eb784f in Tcl_GetString (objPtr=0x8072ab8)
at /home/emiliano/src/tcl/generic/tclObj.c:1636
#5 0xb7eb78d1 in Tcl_GetStringFromObj (objPtr=0x8072ab8, lengthPtr=0xbf2931c4)
at /home/emiliano/src/tcl/generic/tclObj.c:1676
#6 0xb7eacc9e in UpdateStringOfList (listPtr=0x8072ab8)
at /home/emiliano/src/tcl/generic/tclListObj.c:1893

#7 0xb7eb784f in Tcl_GetString (objPtr=0x8072ab8)
at /home/emiliano/src/tcl/generic/tclObj.c:1636
#8 0xb7eb78d1 in Tcl_GetStringFromObj (objPtr=0x8072ab8, lengthPtr=0xbf2932b4)
at /home/emiliano/src/tcl/generic/tclObj.c:1676
#9 0xb7eacc9e in UpdateStringOfList (listPtr=0x8072ab8)
at /home/emiliano/src/tcl/generic/tclListObj.c:1893

Note that group #1, #2 and #3 repeats in #4, #5 and #6, and then the
backtrace output repeats endlessly with these three lines.

The problem doesn't appear when using the tclsh intepreter (I can interrupt
the program with Ctrl-C, but it finalize without a crash.

The system is Linux 2.6.27.7

Discussion

1 2 > >> (Page 1 of 2)
  • Donal K. Fellows

    Works for me with HEAD. Crashes in 8.5.2. (All on OSX/Intel.)

     
  • Donal K. Fellows

    Works for me with tip of 8.5 branch too.

     
  • Jan Nijtmans

    Jan Nijtmans - 2010-07-27

    I can reproduce this problem on Linux (Ubuntu), both with the 8.5 head and 8.6 head!

     
  • Emiliano

    Emiliano - 2010-07-27

    The problem doesn't happen when Tk is loaded dinamically into the tclsh interpreter via [package require Tk], only using the wish interpreter.
    Tried thread-enabled and thread-disabled builds, both crash.

     
  • Don Porter

    Don Porter - 2010-07-29

    This is an infinite loop created because a tclListType
    value somehow has itself as its own second element!

    A list value is not supposed to be an element of itself,
    and it's clearly an invalid state for a multi-element list
    to claim to me an element of itself. The failure here
    is not in the tclListType machinery, but it whatever code
    corrupted this value into an invalid state.

    Just based on the players involved, I'd point fingers
    in the general direction of how (reflected?) channels
    interact with exit cleanup.

    Another approach might be to `git bisect` to discover
    when this failure first showed up.

     
  • Don Porter

    Don Porter - 2010-07-29

    Issue appears between the 8.5.3 and 8.5.4 releases of Tcl/Tk.

     
  • Don Porter

    Don Porter - 2010-07-29
    • priority: 5 --> 8
    • assigned_to: nijtmans --> dkf
     
  • Don Porter

    Don Porter - 2010-07-29

    Quick scan of the changes file fingers
    this as the most likely suspect:

    2008-07-20 (enhancement)[2008248] dict->list preserve item intreps (pasadyn)

    dkf? Any chance a bug in that change could cause internal rep
    corruption where a list tries to make an element of itself? And then
    any hints what could trigger that bug?

     
  • Don Porter

    Don Porter - 2010-07-29

    That doesn't seem to be it.

    If I disable the optimized dict->list
    conversion code, the bug remains.

     
  • Don Porter

    Don Porter - 2010-07-29
    • assigned_to: dkf --> andreas_kupries
     
  • Don Porter

    Don Porter - 2010-07-29

    The invalid value traces back to the
    Tcl_SetReturnOptions() call in
    UnmarshallErrorResult().

    Since I don't see anything obviously
    wrong there, I suspect the error goes
    back further and a corrupt value is
    getting marshalled in the first place.
    Still tracking...

     
  • Don Porter

    Don Porter - 2010-07-29

    The bug is almost certainly in the
    list surgery done in FixLevelCode().

     
  • Don Porter

    Don Porter - 2010-07-29

    Much worse than that.

    Somehow the same Tcl_Obj is being
    returned to the tclFreeObjList twice in
    a row, which leaves it pointing to itself
    as the next Tcl_Obj memory chunk to
    allocate. Hilarity ensues.

    The code which is making the bad
    push back onto the free list is the

    Tcl_DecrRefCount(maskObj);

    in ReflectWatch(), but it's not yet
    clear whether it's the sinner or the
    victim of even earlier corruption.

     
  • Alexandre Ferrieux

    in InvokeTclMethod, dead channel case, why are argOne and argTwo decr-ref'ed ?

    sinner candidate ?

     
  • Don Porter

    Don Porter - 2010-07-29

    It is not safe to call InputTclMethod() with
    zero ref count arguments, but that's what
    ReflectWatch() does.

    The "design" of this code isn't fully clear
    to me, so I don't know whether the right
    fix is to add a Tcl_IncrRefCount() before
    that call (and check all the similar calls
    in the same source code file), or to revise
    InvokeTclMethod() itself.

     
  • Don Porter

    Don Porter - 2010-07-29

    Anyhow, just to confirm that is the root of the problem,
    I added the Tcl_IncrRefCount() call and the bug goes away.
    Leaving with aku to choose what fix is preferred.

     
  • Don Porter

    Don Porter - 2010-07-29
    • priority: 8 --> 9
     
  • Don Porter

    Don Porter - 2010-07-29

    Come up on the chat that just adding
    a Tcl_IncrRefCount() isn't good enough
    to fix things when the rcPtr->interp == NULL
    case is in effect. Then we still would get
    two Decrs after one Incr and the same brokenness

     
  • Don Porter

    Don Porter - 2010-07-29

    Bottom line is that InvokeTclMethod and its callers
    need a complete refCount management analysis and fixup.

     
  • Don Porter

    Don Porter - 2010-07-29
    • labels: 105656 --> 27. Channel Types
     
  • Andreas Kupries

    Andreas Kupries - 2010-07-30

    dgp - This is just a note that I have seen this thread.
    Thank you for your analysis. I'll do a review as quickly as possible.
    Note: tclIORTrans.c is structurally similar to tclIORChan.c which this bug is apparently about.
    As such it may have the same problem.

     
  • Andreas Kupries

    Andreas Kupries - 2010-07-30

    > It is not safe to call InputTclMethod() with
    > zero ref count arguments, but that's what
    > ReflectWatch() does.

    Hm. Going through this I cannot see how that (maskObj with RC == 0) happens. The maskObj is created by DecodeEventMask() and this function runs Tcl_IncrRefCount immediately on it, in line 1975, causing RC == 1.

    I am looking at revision 1.49 of tclIORChan.c

    That being said, I do see that ReflectInput/Output and SeekWide invoke ITM with newly minted objects which have RC == 0. (Still have to look at Block and (Set/Get)Option).

     
  • Andreas Kupries

    Andreas Kupries - 2010-08-03

    My analysis and notes about RC changes.

     
  • Andreas Kupries

    Andreas Kupries - 2010-08-03

    Based on the attached RefCountFlow notes of mine it seems that calling InvokeTclMethod (ITM) with an RC=0 Tcl_Obj should consistently delete it before returning. It is only for RC=1 that argOne, argTwo can be both deleted and kept, depending on the code path taken.

    3. argOneObj (a) CCOC, modeObj, RC=1. RC-- @2194, deleted.
    RC++ @2225 RC=2,
    RC-- @2296, RC=1, kept.
    (b) RWatch, maskObj, RC=1 s.o.
    (c) RClose, ---------------
    (d) RGOpt, ---------------
    (e) RIn, toReadObj, RC=0 RC-- @2194, deleted.
    RC++ @2225 RC=1,
    RC-- @2296, RC=0, deleted.

    ChanCreateObjCmd()
    modeObj = DecodeEventMask() @1975 RC=1.

    InvokeTclMethod ( modeObj, NULL ) => resObj
    modeObj @593 RC-- /assumes ITM rc neutral to detail1. WRONG.
    *** modeObj - may and may not be deleted in ITM.

    CCOC = ChanCreateObjCmd
    RXXX = ReflectXXX

    We cannot simply remove the code block going from lines 2193 to 2198. While that would fix things for the case RC=1, consistently keeping the arg* Tcl_Obj's, for the case RC=0 it would then start failing, keeping / deleting the Tcl_Obj based on code path.

    Best thing we can do IMHO is to consistently call ITM with arg* Tcl_Obj which are RC=1, plus removing that code block.

    Then the objects are consistently kept, and the caller has to delete, i.e. cleanup, by doing its own DecrRefCount.

    Does this agree with your (dgp) analysis as well ?

     
  • Don Porter

    Don Porter - 2010-08-03

    Committed test iocmd-32.2 which tests for this bug.

     
1 2 > >> (Page 1 of 2)