#4461 Busyloop in vwait in innocent HTTP code

obsolete: 8.5.7
closed-fixed
9
2009-11-12
2009-11-10
No

As reported in http://groups.google.com/group/comp.lang.tcl/tree/browse_frm/thread/6deca54954e61cfd/94ab3376153994d0?rnum=1&_done=%2Fgroup%2Fcomp.lang.tcl%2Fbrowse_frm%2Fthread%2F6deca54954e61cfd%2F432b7b5576b7ec5c%3F#doc_19398a625d24ac4d

In 8.5 HEAD:

set file_id [open myfile.html w]
http::geturl http://www.yahoo.com -channel $file_id \
-blocksize 4096 -timeout 60000
close $file_id

This code, in addition to only downloading an incomplete file, busyloops during the whole timeout.
It seems the 0-millisecond timer at line 7978 of TclIO.c fires repeatedly.

Discussion

1 2 > >> (Page 1 of 2)
  • Pat Thoyts

    Pat Thoyts - 2009-11-10

    I can confirm this on 8.5 branch. It important that the channel not be binary. When I used [open output wb] the download succeeded. Adding -timeout and -command options do not affect the fault but do allow the job to be timedout which should help in testing.
    ie: http::geturl $url -channel $f -timeout 10000 -command [list close $f]

     
  • Alexandre Ferrieux

    Turns out the tight loop involves CopyData, the worker from [fcopy].
    Once established, it is entered with csPtr->toRead == -4 !

     
  • Alexandre Ferrieux

    In TclIO.c +8710: I see that in the binary->non-binary case, we correctly call DoWriteChars, but we then use the returned number of bytes (which is 4165 here for an initial 4096 buffer). Then when we decrement the total count to read by this value, we get a negative result:

    if (outBinary || sameEncoding) {
    sizeb = DoWrite(outStatePtr->topChanPtr, buffer, sizeb);
    } else {
    sizeb = DoWriteChars(outStatePtr->topChanPtr, buffer, sizeb);
    }

    if (inBinary || sameEncoding) {
    /*
    * Both read and write counted bytes.
    */

    size = sizeb;

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-10

    Why is DoWriteChars() returning a number which is larger than the size of the buffer it got ?

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-10

    Hm ... non-binary case only ? Could this be some byte vs char misinterpretation somewhere ?

     
  • Alexandre Ferrieux

    The number of bytes is probably larger because of CRLF translation (this happens only on Windows, and the output file is opened in unadorned w mode, hence transl is crlf).
    What I can't grok is why we are taking the number of written bytes to decrement the number of remaining bytes to read. That doesn't make sense. Fix seems trivial but I'd like to understand...

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-11

    Ok, here my notes after reading CopyData and recursing into the related functions, i.e. Do*.

    DoReadChars
    Request: 'toRead' - Number of characters.
    Result: Number of characters read
    Storage: A Tcl_Obj
    Does EOL translation (ReadBytes, ReadChars branches)
    Does encoding conversion -> Result is UTF8, ReadChars branch

    DoRead
    Request: 'toRead' - Number of _bytes_
    Result: Number of bytes read - (Comment says chars, is in error)
    Storage: char* array (bytes).
    NOTE! No encoding conversion applied.
    Does EOL translation (GetInput + CopyAndTranslateBuffer)

    DoWriteChars
    Request: #bytes to write
    Result: #bytes written
    Storage: char* array, utf8 encoded characters
    Does encoding conversion, does EOL translation
    (WriteChars (E, EOL), WriteBytes (EOL)).

    DoWrite
    Request: #bytes to write.
    Result: #bytes written
    Storage: char* array
    Does EOL translation. No encodings applied.

    Use in CopyData seems to be byte-oriented all the way, making trouble
    for DoReadChars, DoWriteChars.

    I readily see places where #chars are compared to #bytes. Gah. And
    then there is the possibility of EOL translations changing sizes
    unpredictably.

    I believe, whatever we have written should have no bearing on the
    number of things 'toRead' left ... We are to copy N chars/bytes, and
    it does not matter how many chars/bytes get written to the destination
    for K chars/bytes of the N for the counting down of how much is left.

    ~~~~

    We should maybe try to

    typedef IntChars int;
    typedef IntBytes int;

    and see if the compiler warns if we assign variables of different Int*
    types to each other. If fear it will not warn.

    Or change the varnames to reflect exactly what type of things they
    count.

    Or maybe split CopyData into four routines, one for each cominbation
    of in/outBinary//sameEncoding. Possibly easier, factor the read and
    write sections of CopyData into separate functions, and then split
    each of these into two, based on the encoding/binary settings. Then we
    could write each function to count properly, without lots of
    conditionals inside to switch between the possible variants.

    This part of tclIO feels a bit like one of the stables of augias(sp?).

     
  • Don Porter

    Don Porter - 2009-11-11

    Has the investigation so far led to a test case
    that does not require the http package and external
    servers? Some quirky use of [fcopy] perhaps?

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-11

    While I made no testcase ... From the description I remember on the chat the data is coming with EOL markers LF, and is stored to a file using EOL markers CR+LF. This means that during write the 4096 byte _expand_ to 4165 IIRC, then the use of the DoWrite* results to count how much is left to read goes wonky ... That should be something we can make a testcase for. I do not remember if we encodings of input and output also played a role in forcing the wonky code-path. Even if yes, that is also something we should be able to emulate in a test case.

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-11

    more notes, copied from the chat.

    [12:33] dgp aku?
    [12:34] aku ?
    [12:34] dgp For a channel with ChannelState field "encoding == NULL"
    [12:34] dgp that's meant to be interpreted as "-encoding binary" ?
    [12:34] aku AFAIK yes
    [12:35] dgp I think the binary transfer rules of CopyData are broken then too.
    [12:36] dgp If the out channel is -encoding binary
    [12:36] dgp and the in channel is not.
    [12:37] dgp then the in channel reads with DoReadChars putting the data into the internal Tcl encoding in the process.
    [12:37] dgp CopyData then just copies these bytes out
    [12:37] dgp and that's not what -encoding binary is supposed to mean, AIUI
    [12:38] aku ... Looking ... Ok, yes ... The DoWriteChars has this special recoding thing going on, and DoWrite doesn't ...
    [12:39] dgp I'm making guesses about what DoWrite() does I suppose.
    [12:39] aku (DoWriteChars line 3357 (8.5-branch))
    [12:39] aku No, I looked into DoWrite, and does EOL translation only, see my notes in the bug report
    [12:41] aku so, it seems we have to forget in/outBinary, and use only sameEncoding to switch between the Do* functions.
    [12:41] aku And the Do*Chars then internally handle -encoding binary
    [12:41] aku if needed
    [12:42] dgp the other wrinkle is that [fcopy] docs say the -size argument is the number of bytes to copy
    [12:43] dgp If that number of bytes leaves us in the middle of a multi-byte character, my interpretation is the partial character does not get copied.
    [12:44] aku ew ... I believe it may be easier to treat this as doc bug, bytes only if -encoding binary on input, otherwise chars in the encoding.
    [12:44] Out of a cloud of smoke, emiliano appears!
    [12:45] dgp chars are usually more sensible things to count at the script level.
    [12:45] kostix is away (Automatically away due to idle)
    [12:45] dgp and for -encoding binary channels chars are bytes
    [12:45] aku Oh ... I just read this ... If only the output channel is set to encoding "binary" the system
    will write the internal UTF-8 representation of the incoming characters
    [12:45] dgp oh
    [12:46] aku IOW, fcopy seems to have a special interpretation for -encoding binary on the output, which the implementation is not honoring
    [12:46] dgp no that's what the implementationj is doing.
    [12:46] dgp Just bizarre that a -encoding binary configuration on a channel means something different to a [puts] than it does to an [fcopy]
    [12:47] aku Sure ? There is this GetByteArrayFromObj going on in DoWriteChars
    [12:47] aku And for the input as well it seems, have you fcopy.n open ?
    [12:47] dgp when outBinary is true, DoWrite() is called
    [12:47] aku Right! Thay may now explain why that is done
    [12:48] dgp some archeology in order
    [12:48] aku There is also ... If only the input channel is set to encoding "binary" the system will assume that the
    incoming bytes are valid UTF-8 characters
    [12:48] dgp Did the docs come first, or just record the botch rather than fix it.
    [12:48] aku That explain the DoRead on input inBinary
    [12:48] dgp Cause this is just too weird
    [12:48] aku docs vs code - timeline ordering - I do not remember
    [12:49] aku Quite
    [12:49] dgp I'll explore a bit more
    [12:49] aku Thanks
    [12:49] aku Shall I copy this discussion and add to the bug rpeort, as additional notes ?
    [12:49] dgp ok
    [12:50] dgp [fcopy] is now [chan copy] ?
    [12:50] aku IIRC yes
    [12:50] dgp and [chan] is 8.5 or 8.6?
    [12:50] aku 8.5 already
    [12:50] dgp One option (not sure I like it yet)....
    [12:51] dgp ...is to just leave [fcopy] untouched.
    [12:51] dgp declare it irretrievably confused.
    [12:51] dgp and start over and get [chan copy] sensibly correct.
    [12:52] aku I have to admit, I do not like that either. OTOH it may be our best bet going forward, leaving the fcopy compatible, and fixing the bug is to change fcopy for chan copy in newer code

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-11

    [12:59] dgp somebody named kupries documented this on 2001-05-19
    [13:00] dgp for Tcl 8.4 ;/
    [13:00] aku :}
    [13:01] dgp 209210
    [13:02] aku Uh, private artifact ? I can't see it.
    [13:02] dgp what bug is that?
    [13:02] aku I am logged in
    [13:03] aku Hm, could that be a scripts bug number ?
    [13:03] aku scripts -> scriptics
    [13:04] dgp don't think so; they were 4 digits
    [13:04] aku Hm ... Not an AS bug number either. (although I already worked for them at the time, just remote from germany)
    [13:06] dgp 219210
    [13:06] aku Ah, yeah, thtat looks like it
    [13:07] aku It seems the special treatment is because I wanted to keep backward compat somehow, for binary encodings
    [13:08] aku and then fumbled things ...
    [13:08] aku ;/
    [13:09] * dgp called away

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-11

    [13:25] dgp back for just a sec
    [13:26] dgp in light of the surprising but documented (!) meaning of -encoding binary in the context of [fcopy] ...
    [13:27] dgp ...appears a possible resolution of 2895565 is some flavor of "Not A Bug"
    [13:27] dgp at least not a bug in [fcopy]
    [13:28] aku Like 'you can't mix in/out translations/encodings in that way' ?
    [13:28] dgp For ugly historical reasons "-encoding binary" doesn't mean what you think it does.
    [13:29] dgp revise your use of [fcopy]
    [13:29] aku Note: The bug as reported appears for -encoding != binary
    [13:30] aku Well, at least out channel != binary. I can't say if the in channel (socket of http) is bianry or not.
    [13:30] dgp ok, probably multiple bugs
    [13:30] dgp have to come back to it; afk again

     
  • Don Porter

    Don Porter - 2009-11-11

    tests io-52.10 and chan-io-52.10 test
    for [chan copy]'s peculiar reaction to
    -encoding binary channels.

     
  • Alexandre Ferrieux

    • assigned_to: ferrieux --> andreas_kupries
     
  • Alexandre Ferrieux

    Very simple repro, portable and without http:

    set f [open aa w]
    fconfigure $f -translation binary
    puts $f [string repeat \n 20]
    close $f
    set f [open aa r]
    fconfigure $f -translation binary
    set g [open bb w]
    fconfigure $g -translation crlf -encoding utf-8
    fcopy $f $g -size 4 -command done
    vwait forever
    puts DONE

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-12

    This patches fixes the issue, looks a bit hackish

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-12

    This is my test case, input is the yahoo page in a file

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-12

    This is the input to trial.tcl

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-12

    Attached a smaller testcase, using just fcopy, and the input file I used. And a patch which apparently fixes the issue, although currently not fully understood why.

    More investigation tomorrow.

     
  • Don Porter

    Don Porter - 2009-11-12

    test io-53.11 committed to 8-5-branch for
    testing this bug.

    Confirmed the test fails, then applying
    fcopy-fix.patch makes it succeed.

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-12

    This is a simpler patch fixing it as well (passes io-53.11)

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-12
    • status: open --> open-fixed
     
  • Andreas Kupries

    Andreas Kupries - 2009-11-12

    The simpler patch passes the whole testsuite, not only io-53.11.
    Committed to 8.5 branch.
    Keeping the entry open until the ports to 8.4 and 8.6 are done.

     
  • Andreas Kupries

    Andreas Kupries - 2009-11-12

    Confirmed the existence of this bug on the 8.6 branch, and that the same patch fixes it.
    Fix committed to HEAD.

     
1 2 > >> (Page 1 of 2)

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks