#3284 exec implemenation fails to close() open file descriptors

obsolete: 8.4.11
closed-duplicate
5
2005-10-31
2005-10-31
Peter Bray
No

Tcl/Tk version : 8.4.6 (exists in 8.4.11)
Platform: Sun SPARC Solaris 9

In the child section of TclpCreateProcess() in
unix/tclUnixPipe.c, after the "pid = fork(); if ( pid
....)" code, the child process sets up std file
descriptors, restores signal handling, and executes the
new process.

Unfortunately, the child leaves all existing file
descriptors open! I believe there should be an option
(perhaps the default) to close all open file
descriptors before exec() so the children (and
grandchildren,...) don't inherit them. If it is an
option then "exec" will need an interface to this
option, to allow the script writer to close all open
file descriptors.

Regards,
Peter Bray
Sydney, Australia

PS: In our case a UI, which loads a shared library that
connects to a server (TCP/IP), used "exec" to start a
backgrounded (daemonised actually) copy of firefox to
display the user help.
LSOF shows that firefox inherited the socket
connections and other file descriptors, and when the UI
exited, the firefox process still had references to
these file descriptors, meaning that the server did not
detect a connection close. Trouble ensued and memory
and cpu exited stage left... as the server try to
buffer data for a client that was not reading from its
socket connection.

Discussion

  • Peter Bray

    Peter Bray - 2005-10-31
    • status: open --> open-duplicate
     
  • Peter Bray

    Peter Bray - 2005-10-31
    • status: open-duplicate --> closed-duplicate
     
  • Nobody/Anonymous

    Logged In: NO

    we are currently having some weird problems when
    using exec on Solaris. The resolution says "Duplicate". Does
    this mean that the bug was submitted before? If so, how can
    I find that bug ID and any extra info? We are using 8.4.11
    and I wonder if upgrading to latest version will solve the
    problem, or maybe there is no problem in tcl at all?

    In general, multi-threaded application, heavily uses tcp/ip
    connection and sometimes after "exec" is called from TCL
    part, connection stops working or program crashes. At first
    glance looks like corrupted memory/stack, but not clear for now.

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    It's not clear what tracker item this is a duplicate of.

    Also, it's not clear why the fcntl(fd,F_SETFD,FD_CLOEXEC);
    calls which the call is doing (and has done for ages; the
    calls in the socket code date back at least as far as our
    CVS archive goes!) are not causing things to behave correctly.

    Is this a Solaris9 bug? Everything seems to work for me
    under Linux. Are you using the channels as stdin/out/err for
    the subprocesses (that's the only time Tcl turns off the
    CLOEXEC flag)? Or do you believe that there are FDs being
    leaked? Is it possible to have a short script that
    duplicates the fault (preferably only using tclsh instances
    so that we can instrument everything!)

     
  • Peter Bray

    Peter Bray - 2006-01-09

    Logged In: YES
    user_id=426773

    OK, I think I know why the confusion exists. The problem
    arises (in my case) because a 3rd party shared library opens
    the FDs (not TCL) but because TCL does not implement
    fork()/exec() just 'exec' there is no where I can insert
    code to close all the shared library open()ed FDs. So this
    leads to them being inherited by the child process, which
    will most likely ignore them, but importantly it keeps them
    open which leads to all sort of issues as time progresses.

    Regards,
    Peter Bray

    PS: I our case the shared library is provided by a vendor,
    not written locally (ie no source code), so we can't add
    close on exec options to the socket open calls.

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    Tcl does fork/exec; the code is all in tclUnixPipe.c in the
    function TclpCreateProcess() and that's the only spot in the
    core that does that sequence. Moreover, I've run subprograms
    following all sorts of different kinds of file-descriptor
    creation (both [open] and [socket]) and no FD ever
    transferred to the child (except for the FDs that were
    supposed to be for stdin/out/err).

    If instead you're talking about the naked exec that you find
    in an extension like TclX, that's a completely different
    thing (and nothing to do with the core!)