#3006 tcl8.4 segfaults running on linux-2.6 with SMP machines

obsolete: 8.4.9
Matthias Klose

tcl8.4.7 (and current 8.4 branch) segfault when running
on Linux 2.6 kernels on machines with 2 or more

It takes several runs to reproduce sometimes and gdb
isn't very helpful.

(gdb) run ./www/c_interface.tcl >c_interface.html
Starting program:
./www/c_interface.tcl >c_interface.html
[Thread debugging using libthread_db enabled]
[New Thread 1076113536 (LWP 14342)]
[New Thread 1084509104 (LWP 14343)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1076113536 (LWP 14342)]
0x4000c410 in _dl_fini () from /lib/ld-linux.so.2
(gdb) bt
#0 0x4000c410 in _dl_fini () from /lib/ld-linux.so.2
#1 0x4013490f in exit () from /lib/tls/i686/cmov/libc.so.6
#2 0x40069b44 in Tcl_Exit () from /usr/lib/libtcl8.4.so.0
#3 0x4004a3b5 in Tcl_ExitObjCmd () from
#4 0x40044604 in TclEvalObjvInternal () from
#5 0x400451b4 in Tcl_EvalEx () from
#6 0x4004563b in Tcl_Eval () from /usr/lib/libtcl8.4.so.0
#7 0x4008eafe in Tcl_Main () from /usr/lib/libtcl8.4.so.0
#8 0x08048733 in main (argc=134512916, argv=0x8048114)
at ../unix/tclAppInit.c:90

Disabling TLS (LD_ASSUME_KERNEL=2.4.21) seems to 'fix' it.

The problematic change appears to be this:

* unix/tclUnixNotfy.c (NotifierThreadProc):
Accepted Joe
Mistachkin's patch for [Tcl SF Bug 990500],
properly closing the
notifier thread when its exits.


If I revert that patch, I can longer reproduce the
segfault, even with 10,000 iterations.

See https://bugzilla.ubuntu.com/show_bug.cgi?id=4141
for the original bug report.


  • Jeffrey Hobbs
    Jeffrey Hobbs

    • assigned_to: andreas_kupries --> mistachkin
    • priority: 5 --> 9
  • Jeffrey Hobbs
    Jeffrey Hobbs

    Logged In: YES

    Joe needs to do more testing of his recommended core patches
    for the exit shutdown.

  • Joe Mistachkin
    Joe Mistachkin

    Logged In: YES

    Can anybody duplicate this bug on some non-Linux nix
    platform with 2 or more processors?

    We need to figure out if this problem is specific to Linux.

  • Joe Mistachkin
    Joe Mistachkin

    Logged In: YES

    This bug may be related to bug #1011715.

  • Joe Mistachkin
    Joe Mistachkin

    • assigned_to: mistachkin --> hobbs
  • Joe Mistachkin
    Joe Mistachkin

    Logged In: YES

    I do not believe my patch is [directly] causing this issue.

    This appears to be happening at process termination.

    I have a theory on how to fix this problem.

    In "tclUnixNotfy.c", near line 303 we have:


    I suggest we add the following after this code:

    * Avoid possible race condition, physically wait for
    thread to die.

    This should avoid a [potential] race condition between the
    process terminating (due to a premature return from
    Tcl_FinalizeNotifier) and the notifier thread actually
    calling TclpThreadExit.

  • Joe Mistachkin
    Joe Mistachkin

    Logged In: YES

    The Tcl_CreateThread call for the notifier thread will also
    need the TCL_THREAD_JOINABLE flag.

    Near line 222 of tclUnixNotfy.c:

    if (TclpThreadCreate(&notifierThread, NotifierThreadProc,
    Tcl_Panic("Tcl_InitNotifier: unable to start notifier thread");

  • Jeffrey Hobbs
    Jeffrey Hobbs

    • priority: 9 --> 8
    • assigned_to: hobbs --> vasiljevic
  • Logged In: YES

    I'm not 100% sure if this is going to fix the problem
    but Joe is right about the notifier thread termination.
    It is more clean to join the exiting notifier thread.

    There is an ever-lasting problem with proper termination
    of Tcl shell process with threads enabled. If you just
    exit() the process *while* there are still some threads
    running, you might get a core. This is not only related
    to the notifier thread but for threads in general. The
    reason is in the Tcl library design and has been observed
    and reported several times already (Bug #597575).
    The problem is not easy to solve but we can certainly close
    this subtle race condition by joining the notifier thread.
    I will add this into both core-8-4-branch and head branch today
    and the poster of the bug-ticket should try to reproduce the
    problem, if possible, as I do not have MP-linux box here. I have
    MP-Darwin and MP-Solaris boxes and I've not observed any
    problem there, provided all threads have been terminated before
    the exit() is called.
    I will leave this ticket open after applying the changes until we
    prove it solves the issue or not.

  • Logged In: YES

    This patch has been incorporated in 8.4.10 but I broke
    the code by compiling with new gcc 4.0.0 on MacOSX.
    I'd like to ask the original poster to try to reproduce
    the bug again if possible by giving it a try against the
    8.4.11 which will soon be released (or by checking out
    the core-8-4-branch).
    I will keep this bug opened for 4 weeks after the 8.4.11
    release and if nobody responds, I will consider this one
    solved and will close this ticket.

  • Logged In: YES

    Just a general note: if you've got glibc 2.3.2, you should
    set LD_ASSUME_KERNEL to 2.4.19 to work around other
    threading bugs (in glibc) that have been found to impact on
    Tcl. Note that this trick also stabilizes other applications
    that make use of threads (e.g. it's helping keep my Java
    applications from locking up). This advice only applies to
    glibc 2.3.2, but that's disappointingly common still...

  • Logged In: YES

    Not sure if its the same or similar problem, but i see
    Segfaults when running 'make test' on a SuSE 10.1 x86_64
    machine, compiling against Tcl 8.5a4 from CVS.

    Its an SMP kernel 2.6.19, glibc 2.4, gcc 4.1.0.

    The segfault vanishes when i use --enable-symbols while
    compiling the thread package (2.6.3 from cvs), but happens
    every time when leaving out symbols.

  • Sounds completely outdated. So much has been done since then, in both process termination and thread safety, that the odds are high that it's fixed by now.

    • status: open --> pending-fixed