#2568 core dump "malformed bucket chain in Tcl_DeleteHashEntry"

obsolete: 8.5.5
closed-fixed
1
2010-12-07
2008-12-15
No

Sometimes I get a core dump with the message "malformed bucket chain in Tcl_DeleteHashEntry" after destroying a toplevel window. This error is not reproducable. Here one backtrace:

(gdb) where
#0 0x40398741 in kill () from /lib/libc.so.6
#1 0x40bf6771 in pthread_kill () from /lib/libpthread.so.0
#2 0x40bf6a7b in raise () from /lib/libpthread.so.0
#3 0x403984d4 in raise () from /lib/libc.so.6
#4 0x40399a08 in abort () from /lib/libc.so.6
#5 0x400e10f9 in Tcl_PanicVA () from /usr/local/lib/libtcl8.5.so
#6 0x400e1120 in Tcl_Panic () from /usr/local/lib/libtcl8.5.so
#7 0x400b90a1 in Tcl_DeleteHashEntry () from /usr/local/lib/libtcl8.5.so
#8 0x40158de1 in Tk_FreeColor () from /usr/local/lib/libtk8.5.so
#9 0x4014d748 in Tk_Free3DBorder () from /usr/local/lib/libtk8.5.so
#10 0x4014d8fa in Tk_Free3DBorderFromObj () from /usr/local/lib/libtk8.5.so
#11 0x401e3804 in Ttk_ClearCache () from /usr/local/lib/libtk8.5.so
#12 0x401e3a4d in CacheWinEventHandler () from /usr/local/lib/libtk8.5.so
#13 0x4015d772 in Tk_HandleEvent () from /usr/local/lib/libtk8.5.so
#14 0x40178f25 in Tk_DestroyWindow () from /usr/local/lib/libtk8.5.so
#15 0x40178bf6 in Tk_DestroyWindow () from /usr/local/lib/libtk8.5.so
#16 0x40178bf6 in Tk_DestroyWindow () from /usr/local/lib/libtk8.5.so
#17 0x402126f0 in TkWmProtocolEventProc () from /usr/local/lib/libtk8.5.so
#18 0x4015dafc in Tk_HandleEvent () from /usr/local/lib/libtk8.5.so
#19 0x4015dfaf in WindowEventProc () from /usr/local/lib/libtk8.5.so
#20 0x400ddba6 in Tcl_ServiceEvent () from /usr/local/lib/libtcl8.5.so
#21 0x400dde17 in Tcl_DoOneEvent () from /usr/local/lib/libtcl8.5.so
#22 0x4015e3cd in Tk_MainLoop () from /usr/local/lib/libtk8.5.so
#23 0x4016b560 in Tk_MainEx () from /usr/local/lib/libtk8.5.so
#24 0x08075d89 in main (argc=2, argv=0xbffff754) at src/tkscid.cpp:403

NOTE:
I'm not using Tk_* functions except Tk_PhotoGetSize, Tk_PhotoGetImage, Tk_PhotoSetSize, Tk_PhotoPutBlock, Tk_CreatePhotoImageFormat, Tk_Main, Tk_Init.
I'm not using Tcl_* functions except Tcl_AppendResult, Tcl_SetResult, Tcl_WrongNumArgs, Tcl_Get*FromObj, Tcl_Seek, Tcl_Read, Tcl_Eof, Tcl_PkgProvide, Tcl_ResetResult, Tcl_Eval, Tcl_CreateCommand.

Discussion

  • Donal K. Fellows

    • labels: 104343 --> 88. Themed Tk
    • assigned_to: nobody --> jenglish
     
  • Donal K. Fellows

    I see Ttk_ClearCache in that trace, so I'll assign to the Ttk maintainer.

    (I don't know if that's correct; the trace is bereft of detail, and probably still wouldn't tell us much if it wasn't as it's probably divorced from the source of the problem anyway. Deletion callbacks are like that...)

     
  • Gregor Cramer

    Gregor Cramer - 2008-12-16

    I made a observation which may help. As far as I remember this bug occured since I used an instance of ttk::scale in the involved window. Yesterday I removed this instance and did some tests. The error does not occur. I had a strong suspicion that ttk::scale is the causer. Today I did some more tests using valgrind. Without using ttk::scale everything is working fine. But the use of ttk::scale causes the following critical error messages:

    ==1872== Invalid read of size 4
    ==1872== at 0x1BB29FDE: Ttk_ChangeElementState (ttkLayout.c:1215)
    ==1872== by 0x1BB352B7: ElementStateEventProc (ttkTrack.c:148)
    ==1872== by 0x1BA7D198: Tk_HandleEvent (tkEvent.c:1386)
    ==1872== by 0x1BA7D7D8: WindowEventProc (tkEvent.c:1804)
    ==1872== by 0x1B9EF42F: Tcl_ServiceEvent (tclNotify.c:675)
    ==1872== by 0x1B9EF70C: Tcl_DoOneEvent (tclNotify.c:914)
    ==1872== by 0x1BA7DC2D: Tk_MainLoop (tkEvent.c:2133)
    ==1872== by 0x1BA8DF54: Tk_MainEx (tkMain.c:321)
    ==1872== by 0x8075D88: main (tkscid.cpp:403)
    ==1872== Address 0x1D2362B0 is 8 bytes inside a block of size 36 free'd
    ==1872== at 0x1B906B04: free (vg_replace_malloc.c:152)
    ==1872== by 0x1B95A18B: TclpFree (tclAlloc.c:729)
    ==1872== by 0x1B964DA1: Tcl_Free (tclCkalloc.c:1182)
    ==1872== by 0x1BB28B63: Ttk_FreeLayoutNode (ttkLayout.c:544)
    ==1872== by 0x1BB28B4D: Ttk_FreeLayoutNode (ttkLayout.c:543)
    ==1872== by 0x1BB2962B: Ttk_FreeLayout (ttkLayout.c:854)
    ==1872== by 0x1BB3AF30: UpdateLayout (ttkWidget.c:33)
    ==1872== by 0x1BB3BC5B: TtkCoreConfigure (ttkWidget.c:548)
    ==1872== by 0x1BB2F81F: ScaleConfigure (ttkScale.c:135)
    ==1872== by 0x1BB3BF01: TtkWidgetConfigureCommand (ttkWidget.c:646)
    ==1872== by 0x1BB3B29D: TtkWidgetEnsembleCommand (ttkWidget.c:171)
    ==1872== by 0x1BB3B309: WidgetInstanceObjCmd (ttkWidget.c:190)

     
  • Gregor Cramer

    Gregor Cramer - 2008-12-16

    Oops. In my previous comment I forget the most important error message from valgrind:

    ==1872== Invalid write of size 4
    ==1872== at 0x1BB29FEA: Ttk_ChangeElementState (ttkLayout.c:1215)
    ==1872== by 0x1BB351FF: ElementStateEventProc (ttkTrack.c:129)
    ==1872== by 0x1BA7D198: Tk_HandleEvent (tkEvent.c:1386)
    ==1872== by 0x1BA7D7D8: WindowEventProc (tkEvent.c:1804)
    ==1872== by 0x1B9EF42F: Tcl_ServiceEvent (tclNotify.c:675)
    ==1872== by 0x1B9EF70C: Tcl_DoOneEvent (tclNotify.c:914)
    ==1872== by 0x1BA7DC2D: Tk_MainLoop (tkEvent.c:2133)
    ==1872== by 0x1BA8DF54: Tk_MainEx (tkMain.c:321)
    ==1872== by 0x8075D88: main (tkscid.cpp:403)

     
  • Joe English

    Joe English - 2008-12-17

    Aha! Thanks, that's something to go on. I know where to look now.

     
  • Joe English

    Joe English - 2008-12-17
    • priority: 5 --> 8
     
  • Joe English

    Joe English - 2008-12-22

    Nope, can't find it.

    Can't replicate, and from code inspection I can't figure out any way that the attached valgrind log could occur unless a <Motion> or <ButtonPress/Release> event were delivered after the widget had been destroyed (which as far as I know Can't Happen).

    Any other clues?

     
  • Gregor Cramer

    Gregor Cramer - 2008-12-30

    Right, the valgrind log occurred after a <Motion> or <ButtonPress/Release>. But this does not happen after the widget had been destroyed, it happens before the widget had been destroyed, and this is the causer for the core dump while destroying the widget (a belated effect). "Invalid write of size 4" means in general an overflow while writing into an array or another data structure, and this happens in ttkLayout.c at line 1215 (possibly you don't know valgrind, I recommend the use of this excellent tool).

     
  • Alexandre Ferrieux

    What's strange is that the code at ttkLayout.c:1215 is just a pair of function calls, with no direct poking-into-memory.
    Is it possible that valgrind reports with coarse granularity due to inlining ? If yes, can you recompile with inlining disabled ? If no, what defines the granularity of valgrind's checks ?

     
  • Gregor Cramer

    Gregor Cramer - 2009-01-09

    I will proof this. Currently my computer is not working. It will take a while.

     
  • Gregor Cramer

    Gregor Cramer - 2009-01-10

    My computer is currently not working, I cannot compile or debug. Despite of this here is my first answer: it's not a matter of inlining. The code at ttkLayout.c:1215 is a direct poking-into-memory -- a write access to a member. I have a strong suspicion that the code is writing into a node which is already freed. With valgrind it shouldn't be hard to proof this, valgrind is easy to use. Because of an defective main board I cannot do this by my own.

     
  • Joe English

    Joe English - 2009-02-08

    Found it! This can happen if the -style, -orient, or theme changes (or anything else which causes the widget to recompute the layout) while the widget is active or pressed . This will leave dangling pointers in the ElementStateTracker (ttkTrack.c).

    gcramer, does that situation sound like something that might be happening in your program?

     
  • Joe English

    Joe English - 2009-02-08

    Script to replicate:

    pack [ttk::scale .s]
    bind .s <ButtonPress-2> { %W configure -orient horizontal }
    bind .s <ButtonPress-3> { %W configure -orient vertical }

    Running this under valgrind, then pressing B2 and B3 on the scale is a fairly reliable way to generate valgrind traces identical to the one posted below.

     
  • Joe English

    Joe English - 2009-02-09

    Fixed in CVS (generic/ttk/ttkTrack.c r1.6).

    (Notes: fix is not strictly correct -- it uses the old layout pointer as a guard value, so recycling may defeat it, but given the way layouts are allocated and freed by widgets this will Almost Never be an issue. Correct fix will take a bit of refactoring.)

     
  • Joe English

    Joe English - 2009-02-09
    • status: open --> closed-fixed
     
  • Nobody/Anonymous

    > gcramer, does that situation sound like something that might be happening
    > in your program?

    I believe it, theme changes are possible in my program. I have to proof this. Unfortunately my computer is not yet working. It is repaired, but I have to install a lot, and this will still take a little while. As soon as I have checked this with valgrind I will respond.

    But I think you have found the error. I guess you have spent a lot of time for this.

     
  • Gregor Cramer

    Gregor Cramer - 2009-03-17

    It took a long time to have a running system after my crash, sorry.

    I tried the ttk stuff from the CVS and valgrind is not complaining anymore. It works!

     
  • Don Porter

    Don Porter - 2010-08-10

    please backport fix for 8.5.9

     
  • Don Porter

    Don Porter - 2010-08-10
    • priority: 8 --> 9
    • status: closed-fixed --> open-fixed
     
  • Joe English

    Joe English - 2010-12-07

    Fixed in CVS. Closed. Please stop reopening.

     
  • Joe English

    Joe English - 2010-12-07
    • priority: 9 --> 1
    • status: open-fixed --> closed-fixed