Menu

#3271 sockets lose data on Windows

obsolete: 8.4.11
open
5
2006-01-10
2005-10-18
Don Porter
No

Attached script is demo of the problem.
Start it in one shell window:

tclsh sdtest.tcl server

to start a server running.

Start it in a second window:

tclsh sdtest.tcl client

to start a client of that server.

The server shell window should
print a sequence of messages
it received from the client, starting
with message 10 count down to 1.

This works fine on linux and
solaris. On windows, the output
does not reach message 1, but
craps out about 4 or 5 messages
short of all the data. I did Windows
testing with the ActiveTcl 8.4.11.2
tclsh.

A minor change to the client part
of the demo script, so that the
socket is explicitly made blocking
*before* the final call to [flush], and
the bug is worked around and all
data passes through on Windows.

This should not be required. Sockets
should not lose data on any platform.

Discussion

  • Don Porter

    Don Porter - 2005-10-18
     
  • Don Porter

    Don Porter - 2005-10-18
    • assigned_to: andreas_kupries --> davygrvy
     
  • Don Porter

    Don Porter - 2005-10-18

    Logged In: YES
    user_id=80530

    Speculation this may be releated
    to 947693 ?

     
  • Don Porter

    Don Porter - 2005-10-18

    Logged In: YES
    user_id=80530

    speculation appears to be false.

    ActiveTcl 8.4.7 has the same problem,
    and that's before the 847693 changes
    happened.

     
  • Don Porter

    Don Porter - 2005-10-18

    Logged In: YES
    user_id=80530

    Same problem present in the Oct. 2002
    ActiveTcl 8.4.0 release.

     
  • David Gravereaux

    Logged In: YES
    user_id=7549

    I do not have any development tools to work on this today.
    reassigning to another.

     
  • David Gravereaux

    • assigned_to: davygrvy --> andreas_kupries
     
  • Don Porter

    Don Porter - 2005-10-18

    Logged In: YES
    user_id=80530

    same problem in the oldest ActiveTcl
    I found, 8.3.3 from April 2001.

    Looks like flushing non-blocking sockets
    on Windows has just been broken for
    a long, long time.

     
  • David Gravereaux

    Logged In: YES
    user_id=7549

    There is an odd situation with the generic layer where if an
    amount of read() operations caused by a given [gets] call
    consumes EOF to the generic layer it ends up being the
    responsibility of the channel driver to continue firing
    readable operations on the channel until it is closed. IMO,
    EOF had already been read into the generic layer and given
    it's knowledge of EOF, shouldn't the channel driver's job be
    done regarding notification? And shouldn't it be the generic
    layer's responsibility to fire off readable instead?
    Honestly, this is quite inefficient when the channel driver
    will never expect anymore system notifications for that
    socket anymore and needs manufacture them just for this
    situation.

    I'm not sure if this relates, though.

     
  • Don Porter

    Don Porter - 2006-01-10

    Logged In: YES
    user_id=80530

    looking at this again, it appears
    that what it required on the client
    side to avoid data loss is *both*
    an [fconfigure -blocking 1] and
    an explicit [close].

    If the client side is left non-blocking
    data is lost. If the [close] command
    is not explicitly done, then the implicit
    close that should happen during [exit]
    loses data too.

    Note that it's all changes on the client
    side of the connection that make a difference.
    Configuring the server side doesn't seem to
    play a role at all, which suggests to me
    the problem is not with the read side of
    things.

     
  • Don Porter

    Don Porter - 2006-01-10
    • summary: non-blocking sockets lose data on Windows --> sockets lose data on Windows
     
  • Don Porter

    Don Porter - 2006-01-10

    Logged In: YES
    user_id=80530

    did some more testing, and
    even in the case where the
    socket is never made non-blocking,
    data can still be lost if the
    client side does not perform
    an explicit [close].

    Revised summary to reflect that
    non-blocking is not essential to
    demo of the bug.

    So why would an explicit [close]
    differ from the Tcl_Close()
    that ought to be implicit
    in finalization?

     
  • Don Porter

    Don Porter - 2006-01-20

    Logged In: YES
    user_id=80530

    New reports are indicating that
    even forcing a [close] on the
    client side is not enough.

    If the "nice" level of the two
    processes are such that the client
    gets more cycles than the server,
    then it is reported that we still
    see data loss.

    Perhaps a second bug, server side
    this time?