#2440 [fcopy] callback sometimes fails to trigger

obsolete: 8.5.2
closed-fixed
9
2008-04-04
2003-07-30
No

Assigned to dgp for review, as it might have something
to do with [Bug 218203]).

What I see is: let $s be a socket, $f a normal file,
$nBytes a "large" number (8kB or above seems to trigger
the error with good probability), both channels with
"-translation binary". Then the trouble is that
fcopy $s $f -size $nBytes -command myCallback
often fails to notice that $nBytes *were* already
written to $f, and hence never calls myCallback.

It might be an off-by-one heisenbug: as I said, it
happens often but not always - and it seems never to
happen if one more byte than promised comes through the
socket.

I attached here a script that produces the error with
good probability; uncommenting line 72 does the "one
more byte" trick.

The script attached below produces the following
results on RH7.2 (relevant??):
[mig@mini test2]$ /CVS/tcl_SF_clean/unix/tclsh test.tcl
/CVS/tcl_SF_clean/unix/tclsh - 8.5a0
client: done sending
server: copying 16384 bytes to tempFile
client: won't wait anymore ...
server: [fcopy] wanted 16384, and got 16384
server: [fcopy] doesn't tell us we are finished, exit
anyway

whereas the correct output (uncommenting line 72) is
[mig@mini test2]$ /CVS/tcl_SF_clean/unix/tclsh test.tcl
/CVS/tcl_SF_clean/unix/tclsh - 8.5a0
server: copying 16384 bytes to tempFile
client: done sending
server: copied 16384 bytes to tempFile.
server: all done

Discussion

  • miguel sofer

    miguel sofer - 2003-07-30
     
  • miguel sofer

    miguel sofer - 2003-07-30
    • assigned_to: nobody --> dgp
     
  • miguel sofer

    miguel sofer - 2003-07-31

    Logged In: YES
    user_id=148712

    Addenda:
    * failure reproduced in win2k
    * the [fcopy] callback does occur after the client closes
    the socket
    * different -buffering settigs do not seem to change behaviour

     
  • miguel sofer

    miguel sofer - 2003-07-31

    Logged In: YES
    user_id=148712

    More:
    * can't repro when the slave is a pipe.
    * bug also present for client and server on different machines

     
  • Donal K. Fellows

    • labels: --> 24. Channel Commands
    • milestone: --> 284128
     
  • Don Porter

    Don Porter - 2003-11-13
    • assigned_to: dgp --> msofer
     
  • Don Porter

    Don Porter - 2003-11-13

    Logged In: YES
    user_id=80530

    The test script works for me
    for both the 8.4 branch and HEAD,
    on both Solaris 9, and Linux/Alpha.

     
  • miguel sofer

    miguel sofer - 2004-03-26
    • assigned_to: msofer --> andreas_kupries
     
  • miguel sofer

    miguel sofer - 2004-03-26

    Logged In: YES
    user_id=148712

    Bug still there, newer linux here :(

     
  • miguel sofer

    miguel sofer - 2004-04-01

    Logged In: YES
    user_id=148712

    Is this a real bug, or mainly EBKAC? If it is a real bug, it
    needs attention - if it is EBKAC, I need a spanking ...

    The bug I (think that I) see is: [fcopy] is failing to call
    its callback after copying the required number of bytes.

    The test script lacks comments, portability ('puts stderr'),
    and nicety (writes a tempFile in wd, leaves it there),
    sorry. What it does is:

    1. opens a server socket; on connection 4 bytes are read
    (number N of following bytes), then N bytes are fcopied to
    tempFile, calling the doneCopy callback when done. doneCopy
    writes a log to stderr, end then causes the app to exit. The
    app exits also after receiving the second message through
    the pipe (used to signal a failure).

    2. opens (via pipe) a second tclsh that connects to the
    server and sends a binary (length+message). After message
    end, the client waits for 10 seconds and prints an exit
    message - if it still alive, which should not happen as the
    pipe should have been closed before that, when the main
    process exits.

    Observed behaviour: fcopy's callback is not called after the
    10 seconds, the client prints its exit message.

    Should have observed: fcopy's callback is called, no client
    exit message seen.

    Weirdness: if the client sends an extra byte down the
    socket, all's well.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2008-03-29

    Logged In: YES
    user_id=72656
    Originator: NO

    Works for me with 8.5.1 and 8.4.18.

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2008-03-29
    • status: open --> pending-works-for-me
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2008-03-29
    • milestone: 284128 --> obsolete: 8.5.2
    • priority: 5 --> 6
    • status: pending-works-for-me --> open
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2008-03-29

    Logged In: YES
    user_id=72656
    Originator: NO

    Via Cleverly (non-default value key to trigger error):

    Consistently reproducible for me for values of 21, 23, 24, 25 & 26 on
    OS X 10.5.2 (intel) w/Tcl 8.5.0. 22 always seems to work. I haven't
    tried >= 27.

    With 8.5.2rc1 on Linux 2.6.18 (sparc64) it seems to be consistently
    reproducible for values of 14, 16, 17, 18, 19 & 20. 15 always seems
    to work. I haven't tried >= 21 on this platform.

    With 8.5.2rc1 on OpenBSD 4.2 (sparc64) it seems to be consistently
    reproducible for values of 11 & 12. 13-18 always seem to work. 19 &
    20 sometimes worked, sometimes didn't. (For example, out of twenty
    consecutive runs passing 20 as an argument 14 failed, 6 succeeded.) I
    haven't tried >= 21 on this platform.

     
  • Alexandre Ferrieux

    Logged In: YES
    user_id=496139
    Originator: NO

    Found and fixed. This was a day-1 bug...

    Bug summary: doing an async [fcopy -size $size], if the input stalls (not EOF) after exactly $size bytes, the callback is not called.

    The problem is in the CopyData function, near the end of the
    while(toRead) loop, in the if (cmdPtr) block. Apply the following
    patch:

    - if (cmdPtr) {
    + if (cmdPtr &&(csPtr->toRead != 0)) {
    /*
    * The first time we enter this code, there won't be a channel
    * handler established yet, so do it here.
    */

    Explanation: this test is here to "break out" of the entire routine
    (not just the while loop) with a return TCL_OK, in async mode, after
    making sure we have a fileevent to wake us up later. The problem is
    that at this spot the remaining number of bytes to read due to [-size]
    has already been updated, and when it reaches zero, we should only
    break of the while loop in order to terminate properly (and call the
    callback).

    Hence the fix: just add "&&(csPtr->toRead != 0)", so that it will fall
    thru to the beginning of the loop, which does test toRead.

    We can also understand why it was "rare" and "random": this is because
    people often use "-size" only as a progress indicator, so the pump is
    often "re-primed" by additional data on the channel coming after the
    given -size bytes. But what about the last chunk ? I believe people
    often use a channel with a proper EOF (not a never-ending pipe like in
    Miguel's ad hoc example), which completely hides the issue since it
    fires the readable fileevent !

    Below is a small script showing the bug with probability one: the script blocks forever.
    Patched version completes normally after reading 3 lines from the child process.
    Test suite OK.

    -Alex

    set pi [open "|[info nameofexecutable] 2>@ stderr" r+]
    fconfigure $pi -translation binary -buffering line
    fconfigure stdout -translation binary -buffering line

    puts $pi {
    fconfigure stdout -translation binary -buffering line
    puts stderr Waiting...
    after 1000
    foreach x {a b c} {
    puts stderr Looping...
    puts $x
    after 500
    }
    puts stderr Now-sleeping-forever
    while {[gets stdin line]>=0} {}
    exit
    }

    proc done args {puts stderr DONE;exit}

    fcopy $pi stdout -size 6 -command done

    vwait forever

     
  • Donal K. Fellows

    • priority: 6 --> 9
     
  • Andreas Kupries

    Andreas Kupries - 2008-04-02

    Logged In: YES
    user_id=75003
    Originator: NO

    Attaching Alexandre's final test script. Committed the fix to Tcl 8.4 branch, Tcl 8.5 branch, and head. Still have to convert the test script into a test case.
    File Added: FC.tcl

     
  • Andreas Kupries

    Andreas Kupries - 2008-04-02

    Alexandre's test script

     
  • Andreas Kupries

    Andreas Kupries - 2008-04-02
    • status: open --> open-fixed
     
  • Alexandre Ferrieux

    Logged In: YES
    user_id=496139
    Originator: NO

    Andreas, shouldn't this one be closed ? I'm not pressing for this (maybe you have other tests in mind), it's just for consistency with 1932639 which is much younger and already closed.

     
  • Andreas Kupries

    Andreas Kupries - 2008-04-04

    Logged In: YES
    user_id=75003
    Originator: NO

    In principle yes. I left it open as reminder to convert your test script to actual test case. I can close it if you want.

     
  • Alexandre Ferrieux

    Logged In: YES
    user_id=496139
    Originator: NO

    Oops, sorry, I forgot. No hurry :-)

     
  • Andreas Kupries

    Andreas Kupries - 2008-04-04

    Logged In: YES
    user_id=75003
    Originator: NO

    Created the test case and committed to 8.4 branch, 8.5 branch, and head.
    Closing this bug. Finally.

     
  • Andreas Kupries

    Andreas Kupries - 2008-04-04
    • status: open-fixed --> closed-fixed
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks