#1641 zero-length write considered harmful

obsolete: 8.3.3
closed-fixed
7
2002-02-04
2001-09-27
Dale Talcott
No

Tcl 8.3.3, OS Platform AIX 4.3.3

There is an unfortunate interaction between tcl and
Unix systems that implement the Single Unix
specification. In particular, the Single Unix
specification for writing to/reading from a STREAM
device requires that a zero-length write on one
side show up as a zero-length read on the other.
That is, a zero-length write gets interpreted as EOF
when read.

http://www.opengroup.org/onlinepubs/007908799/
xsh/write.html
http://www.opengroup.org/onlinepubs/007908799/
xsh/read.html

This is a problem for tcl because when tcl flushes
its channel buffers, if there is nothing in the buffer,
tcl ends up performing a zero-length write. If the
channel happens to be connected to a STREAMS
file, this appears as EOF to the other side of the
STREAM.

This causes grief for several programs, including
script, sshd, and expect. It has the interesting
effect that, if you are logged on via SSH and run the
tcl test suite, you get logged off! during io.test.

Fortunately, there are not that many OS's that
implement pseudo-tty's on STREAMS, but IBM's
AIX 4.3.3 is one.

One can argue that the spec is wrong for changing
what used to be a harmless action (writing
nothing) into something that now has a meaning.
However, this is probably a losing battle. I suggest
that tcl be modified to avoid zero-length writes.

The following patch changes the Unix
FileOutputProc to avoid zero-length writes:

--- unix/DRT/tclUnixChan.c Tue Apr 3 17:54:39
2001
+++ unix/tclUnixChan.c Thu Sep 27 14:04:58 2001
@@ -423,6 +423,8 @@
int written;

*errorCodePtr = 0;
+ if (toWrite == 0)
+ return 0; /* avoid zero-length write
*/
written = write(fsPtr->fd, buf, (size_t) toWrite);
if (written > -1) {
return written;

This has been tested only to the extent that the tcl
tests now run on AIX without logging me off.

Discussion

1 2 > >> (Page 1 of 2)
  • Vince Darley
    Vince Darley
    2001-09-27

    • labels: 104242 --> 25. Channel System
    • assigned_to: vincentdarley --> andreas_kupries
     
  • Vince Darley
    Vince Darley
    2001-09-27

    Logged In: YES
    user_id=32170

    This is really a channels problem, and not in my area of
    understanding or expertise. Assigned to Andreas.

     
  • Logged In: YES
    user_id=75003

    I believe that this is something the drivers should not
    need to be aware about. In other words, I believe that this
    should be fixed in the generic part of the I/O system.

    I'll try this and report back.

     
    • priority: 5 --> 6
     
  • Don Porter
    Don Porter
    2001-09-27

    Logged In: YES
    user_id=80530

    Ah. So that explains it! And here I was
    blaming a bug in SSH.

    I've also been bitten by this, trying to
    work with Tcl through SSH on an AIX machine.

    I defer to AK's judgment on the solution.

     
    • priority: 6 --> 7
     
  • Logged In: YES
    user_id=75003

    Don, after doing a simple patch to avoid 0-write's I get
    segfaults on the AS linux box too. I will see later if the
    same happens on sparky and my system at home.

    Stacktrace:

    0 0x40095e8b in chunk_alloc (ar_ptr=0x4012ad40, nb=32) at
    malloc.c:2875
    #1 0x400955ae in __libc_malloc (bytes=21) at malloc.c:2696
    #2 0x80f50c9 in TclpAlloc (nbytes=21)
    at ../tcl/unix/../generic/tclAlloc.c:672
    #3 0x80686dc in Tcl_Alloc (size=21)
    at ../tcl/unix/../generic/tclCkalloc.c:983
    #4 0x80a33ec in AllocStringEntry (tablePtr=0x8160958,
    keyPtr=0x8158460)
    at ../tcl/unix/../generic/tclHash.c:927
    #5 0x80a2d28 in Tcl_CreateHashEntry (tablePtr=0x8160958,
    key=0x8158460 "http", newPtr=0xbfffaf4c)
    at ../tcl/unix/../generic/tclHash.c:450
    #6 0x80d7dde in TclSetElementOfIndexedArray
    (interp=0x810fa78,
    localIndex=3, elemPtr=0x818e848, newValuePtr=0x819ca98,
    flags=512)
    at ../tcl/unix/../generic/tclVar.c:1746
    #7 0x808f68a in TclExecuteByteCode (interp=0x810fa78,
    codePtr=0x816cde8)
    at ../tcl/unix/../generic/tclExecute.c:1251
    #8 0x806479a in Tcl_EvalObjEx (interp=0x810fa78,
    objPtr=0x81952c8,
    flags=0) at ../tcl/unix/../generic/tclBasic.c:2951
    #9 0x80cb139 in TclObjInterpProc (clientData=0x8163eb0,
    interp=0x810fa78,
    objc=2, objv=0x8112080)
    at ../tcl/unix/../generic/tclProc.c:1075
    #10 0x808d968 in TclExecuteByteCode (interp=0x810fa78,
    codePtr=0x81990a8)
    at ../tcl/unix/../generic/tclExecute.c:869
    #11 0x806479a in Tcl_EvalObjEx (interp=0x810fa78,
    objPtr=0x81a2f08,
    flags=131072) at ../tcl/unix/../generic/tclBasic.c:2951
    #12 0x80ad988 in TclChannelEventScriptInvoker
    (clientData=0x81a96c0,
    mask=2) at ../tcl/unix/../generic/tclIO.c:7084
    #13 0x80ad319 in Tcl_NotifyChannel (channel=0x819a3e8,
    mask=2)
    at ../tcl/unix/../generic/tclIO.c:6620
    #14 0x80e4452 in FileHandlerEventProc (evPtr=0x815abd0,
    flags=-3)
    at ../tcl/unix/../unix/tclUnixNotfy.c:624

     
  • Logged In: YES
    user_id=75003

    The seg.fault goes away with the fix for bug
    [#465494] "I/O segfaults in test suite".

    IOW, it was the same bug, as I suspected.

     
  • Logged In: YES
    user_id=75003

    Enclosed is the patch I made.

    Jeff pointed out that there are possibly subtle problems
    with the patch as is: When writing to an OS channel an
    error like EPIPE can be returned.

    The specification and the manpage are not clear if this can
    happen for a zero-length write too. If that is so the patch
    changes the tcl level semantics of something like [puts -
    nonewline $chan ""]. Before the patch this command can
    fail, afterwards it won't (assuming a valid $chan).

    This is something any patch to avoid zero-length writes can
    run into.

    Opinions ?

     
  • Jeffrey Hobbs
    Jeffrey Hobbs
    2001-09-28

    Logged In: YES
    user_id=72656

    The enclosed patch is definitely not correct. If you look
    at the TcpOutputProc in tclUnixChan.c, you'll see that it
    is possible for errorCode to be set even before a write is
    attempted. Also, errorCode will be used when written == 0,
    but it was never set. Thus, at least errorCode must also
    be set, but perhaps we are still overlooking possible valid
    side-effects (like acceptable errors returned with write 0).

     
  • Don Porter
    Don Porter
    2001-09-28

    Logged In: YES
    user_id=80530

    I sometimes use [catch {puts -nonewline $socket ""}]
    as a way to test that a socket connection is valid
    without actually writing to it. It sounds like this
    patch will turn that command into a no-op?

     
  • Logged In: YES
    user_id=75003

    with regard to my own patch the handling of errorCode is
    this: In FlushChannel the errorCode is initialized to zero,
    but not in Tcl_WriteRaw. In both cases it won't be used
    for 'written == 0'.

    Question: Is it possible to query the OS if a there is a
    STREAM behind a file descriptor ? If yes avoidance of zero-
    length writes could be restricted to this specific
    situation.

     
  • Logged In: YES
    user_id=75003

    Regarding Don's question: Correct, any patch to avoid zero-
    length writes (either on all channels or possibly streams-
    based ones = pipes, sockets) will change the given
    statement into a full no-op.

     
  • Don Porter
    Don Porter
    2002-01-31

    Logged In: YES
    user_id=80530

    Can we tackle this problem now at the VFS level?
    That could prevent zero-length writes to disk
    that seem to be the problem, while preserving
    the zero-length write testing to sockets.

     
  • Logged In: YES
    user_id=75003

    How could the VFS help in this ?

     
  • Don Porter
    Don Porter
    2002-01-31

    Logged In: YES
    user_id=80530

    Now that I look at it more carefully, I guess it can't.
    The intersection of VFS and channels is only
    Tcl_FSOpenFileChannel(), and that has no control
    on zero-length writes. I guess the responsibility
    has to be pushed out to the channel drivers themselves.

     
  • Logged In: YES
    user_id=75003

    So you are willing to change
    [catch {puts -nonewline $chan ""}]
    into a true no-op, provided that chan points to a socket o
    pipe ?

     
  • Don Porter
    Don Porter
    2002-01-31

    Logged In: YES
    user_id=80530

    The opposite, I think.

    The {puts -nonewline $sock ""} is actually useful
    for sockets for testing whether they're connected.

    For other types of channels, I think I don't care
    whether it's a no-op.

    If I had a reliable, cross-platform, and reasonably
    quick (do not suggest [fconfigure $sock -peername])
    alternative way of testing socket connectedness for
    writing, I might not mind that changing either.

     
  • Logged In: YES
    user_id=75003

    But isn't it the sockets which are STREAM based and thus
    have trouble with the zero-length write ? And thus have to
    test and make it a no-op ?

     
  • Don Porter
    Don Porter
    2002-01-31

    Logged In: YES
    user_id=80530

    Let me put it this way. If I add the line

    if (toWrite == 0) return 0;

    to the routine FileOutputProc() in unix/tclUnixChan.c,
    then my issues on AIX systems are solved, yet I don't
    think I've lost existing behavior on sockets. (correct?)

     
  • Logged In: YES
    user_id=75003

    I am not sure. Can a file be based on a STREAM ?
    If so, then yes, this should solve the problem.

    Actually this could be what the submitter proposed
    long ago.

     
  • Logged In: NO

    If you want to be conservative with the fix, you
    could use

    if (toWrite == 0 && isatty(fsPtr->fd)) return 0;

    or similar to avoid _only_ the effect of logging out
    the user. (Sockets are not ttys.)

    I just lost five hours of my life on this braindamage and
    want them back.

     
  • Logged In: YES
    user_id=75003

    Committed to head.

     
    • status: open --> closed-fixed
     
  • Logged In: YES
    user_id=75003

    Deleting the attached patch because that is not the patch
    which was comitted.

     
1 2 > >> (Page 1 of 2)