Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


#4228 chanio.test file error

obsolete: 8.6a4
Don Porter

Sometime after test chan-io-73.1
passes, an error aborts the evaluation
of chanio.test:

---- chan-io-73.1 start
++++ chan-io-73.1 PASSED
Test file error: error flushing "file5": broken pipe
while executing
"chan flush $f"

The error message is mysterious
since it blames a [chan flush]
command, and there are no visible
[chan flush] commands in the part
of chanio.test that follows test
chan-io-73.1. Perhaps its in the
body of some proc, or in some trace
handler, but in those cases I'd expect
a longer stack trace revealing that
context as well.

Based on the timing only, I suspect
the TIP 332 commit is to blame.


1 2 > >> (Page 1 of 2)
  • Don Porter
    Don Porter

    the error appears to arise
    not during evaluation of
    chanio.test, but in the
    command [close $pipeFd] at
    line 2767 of tcltest.tcl
    in the tcltest package.

    The error message is still broken.

    The fact that only chanio.test
    suffers, indicates there's some
    interaction with it going on.

    Attempts to learn more with the
    -singleproc 1 option led to more
    confusion. There's some nastiness
    here where tricky testing of things
    like std channels and their flush
    patterns (?) are getting messed
    with while the testing harness
    needs them functioning reliably.

  • Don Porter
    Don Porter

    some indications the
    original [chan flush]
    error happens in test
    chan-io-29.31. Don't know
    why it's escaping to stderr
    at all, let alone at unpredictable

    Even weirder:
    $ make test TESTFLAGS="-file chanio.test -singleproc 1 -verbose bpste -skip chan-io-29.31"
    ---- chan-io-29.35 start
    could not find interpreter "y"
    while executing
    "interp delete y"
    ++++ chan-io-29.35 PASSED

    Some partial stack trace just
    thrown out there on some visible
    channel, but with no test failure,
    nor any test file error reported.

    I can't get a handle on things, but
    indications are there's some significant
    troubles in the recent commit.

  • FWIW, disabling test chan-io-28.6 (half-close) makes the problem disappear.
    Just committed this change to avoid polluting others' lives.
    But I'm not pulling it under the rug :-)
    Investigation ongoing...

  • Strace investigation shows that the original "broken pipe" occurs in chanio-29.27.
    Strangely, the error mentions "file5" while the EPIPE occurs on descriptor 4:

    9927 10:34:27.076443 write(4, "output\n", 7) = -1 EPIPE (Broken pipe)
    9927 10:34:27.076472 --- SIGPIPE (Broken pipe) @ 0 (0) ---
    9927 10:34:27.077022 write(1, "++++ chan-io-29.27 PASSED\n", 26) = 26

    So I suspect incorrect cleanup in the channel system during half-close.
    For this I'd like Andreas' help since he wrote the generic layer of #332.

  • Homing in on the 6-legged animal:

    set f [open "|echo" r+]
    gets $f
    after 1000
    puts $f yay
    close $f w
    ==> should flush and report EPIPE
    ==> but doesn't complain
    close $f
    ==> barfs "bad file number"

    Shouldn't be long now :)

  • OK confirmed, it's the same bug as mentioned in the patch entry, re the crash on [close w] of an unflushed channel: currently the Flush is ill-positioned within the tearoff sequence, occurring too late.
    Will try to refactor alone in the absence of Andreas, but help welcome.

  • Just committed code doing a specific flush in the half-close-w pathway (because the normal flush is inside UnregisterChannel which must not be called for the "first half").
    Andreas can you verify I'm doing it right, because I don't grok the subtleties of flagging blocks READY nor Tcl_Flush vs FlushChannel etc.

    This doesn't solve the bug obtained when re-enabling chanio-28.6 though.

    My investigation so far seems to indicate we are witnessing a "ghost error status" in a stale channel-related structure.
    By this I mean, the _message_ is from an innocuous, previously caught error (a [chan flush] in a previous test), while possibly the TCL_ERROR status is fresh and is the core of the bug.

    The error goes to the child tcltest's stderr because it occurs within a vwait (hence a bgerror or default C-level background error handler must be just spitting to stderr).
    The vwait is: vwait [namespace which -variable counter] (near the end of chanio-29.31).

  • New data point: I have played a bit with chanio.test, re-enabling 28.6 and skipping 28.7..29.30.
    In 29.31 where the offending vwait is, I added:

    proc myhan args {exec echo "BGERROR:$args >>> $::errorInfo <<<" > /dev/tty}
    interp bgerror {} [namespace which myhan]

    this has the nice property of spitting out synchronously (which I find pretty hard to do in tcltest):

    ---- chan-io-29.31 start
    BGERROR:{} {-code 0 -level 1} >>> can't read "ConstraintInitializer(asyncPipeChan)": no such element in array
    while executing
    "return $ConstraintInitializer($constraint)"
    (procedure "ConstraintInitializer" line 5)
    invoked from within
    "ConstraintInitializer $n2" <<<
    ++++ chan-io-29.31 PASSED

    I notice an empty direct error message and a ghost errorInfo...FWIW

  • Don Porter
    Don Porter

    While this won't solve the underlying
    bug, if tcltest is getting confused
    due to writes to stderr from background
    error reporting, then setting an
    [interp bgerror] handler to avoid that
    during those tests should at least be
    an improvement while the real bug is

  • Don Porter
    Don Porter

    You should not expect the $::errorInfo
    seen in an [interp bgerror] handler to
    be in sync. [interp bgerror] handlers
    are supposed to pull errorinfo data
    out of the second dict argument.

    The weird thing is the indication that
    the handler is being called when there's
    no exception for it to report. Arguments
    indicate a TCL_OK return code and an
    empty result.

  • Don Porter
    Don Porter

    dropping prio just a tiny bit
    since this no longer prevents
    a(n apparent) clean test suite run.

    Bugs should still get fixed
    of course, but if it's hard
    enough to take a while fixing,
    I don't think we need to
    block 8.6b1 (which no one expects
    to be bug free) waiting for
    the fix.

  • Don Porter
    Don Porter

    • priority: 9 --> 8
    • priority: 8 --> 9
  • Put the script from 'Homing in on ...' into a script hc1.tcl.
    Running it with the rewrite as explained in the mail ...

    andreask@gila:~/workbench/HalfClose/build/tcl86> ./tclsh hc1.tcl
    broken pipe
    while executing
    "close $f w"
    (file "hc1.tcl" line 5)

    Which is the expected result. Next now to test the chan-io-28.* cases.

  • Ok, fixed a missing initialization and a place where a flag name was not properly modified, and the two chan-io-28.[67] test cases pass as well.

    What I still have is

    ++++ chan-io-73.1 PASSED
    Test file error: can't read "ConstraintInitializer(asyncPipeChan)": no such element in array
    while executing
    "return $ConstraintInitializer($constraint)"
    (procedure "ConstraintInitializer" line 5)
    invoked from within
    "ConstraintInitializer $n2"
    error flushing "file5": broken pipe
    while executing
    "chan flush $f"

    Running with -singleproc 1 this vanishes ... Checking. Yes, this is written to stderr by something ...

    Ok, we have _two_ places writing to stderr.

    One, during chan-io-28.7 we get
    Test file error: can't read "ConstraintInitializer(asyncPipeChan)": no such element in array
    while executing
    "return $ConstraintInitializer($constraint)"
    (procedure "ConstraintInitializer" line 5)
    invoked from within
    "ConstraintInitializer $n2"

    And two,
    error flushing "file5": broken pipe
    while executing
    "chan flush $f"
    comes from io-29.31

    I ran the chanio file with -singleproc 1 -verbose bpste, then captured stdout and stderr in one file to get the proper order.

    So, even with the rewrite this ghost thing is still present. Digging on ...

    • priority: 9 --> 8
  • The first output to stderr is something which points into tcltest ...
    Note: Using tclsh as shell in test cases is not good, it uses whatever it finds in the PATH.
    Changed to [interpreter], which is a command to get the current executable.

  • Don Porter
    Don Porter

    Commit what you have and I will
    look into the background error
    troubles. I suspect you are
    expressing concern about normal
    background handling behavior.

  • FWIW, the background reporting is not mysterious; [interp bgerror] actually keeps it under control instead of confusing us by going to stderr and being delayed to the end of the test. So in my bg handler (in 29.31) I used the [testpanic] trick you gave me on the chat, and got a core with an explorable stack frame. Andreas/Don maybe you can do the same and understand what exactly is the cause of this ghost background error ?

  • Don Porter
    Don Porter

    background error handling is
    done asynchronously! This may
    be confusing or unexpected, but
    it is not a bug. I have not
    taken time to confirm, but I suspect
    all this talk of "ghost errors"
    is normal functioning of an
    asynchronous reporting mechanism.

  • Instrumenting the test case ...

    \---- chan-io-28.7 start
    can't read "ConstraintInitializer\(asyncPipeChan\)": no such element in array
    while executing
    "return $ConstraintInitializer\($constraint\)"
    \(procedure "ConstraintInitializer" line 5\)
    invoked from within
    "ConstraintInitializer $n2"
    ++++ chan-io-28.7 PASSED

    This is something written during the vwait event loop.

    Ah. The error message is by 'chan-io-28.3', which is a skipped test. And is was skipped because of missing constraint asyncPipeChan. And the 28.7 test case somehow writes this to stderr ... Hrrm.

    Note: Just running test 28.[367] generates the output. Running [67] generates a different output about a missing interpreter "z". Running 7 alone is ok.

    Note: Looking at 28.[67] I see that the 'after 1000 ...' events are not canceled!! This can (well, with murphy, will) cause breakage in any other test coming later and using vwait on the same variable. Fixed that. Doesn't fix the isssue at hand however.

  • Oi. I should read SF more often. I missed all the discussion regarding bgerror and such.
    Ok. I will undo my instrumentation, leave in the after cancel's I added and then
    commit the fixes.

  • Committed my changes.

  • Thanks. First off, chanio-28.7 still fails on Windows (mingw): nothing is received back from the server.

  • Ok, this is now going into places (Windows, Windows Pipes) I have no real knowledge off. I could try tomorrow, when I am back in the office, at a Win system, however no promises there. tclguy and davygrvy are likely better suited.

1 2 > >> (Page 1 of 2)