Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


#4487 http::geturl abuses vwait on async call

current: 8.5.14
Jan Nijtmans
Alaoui Youness

As discussed on IRC with some of you, aMSN keeps crashing because of "too many nested evaluations", and it seems to be caused by a nested event loop by calling update/vwait/tkwait. I found out what the cause is and it seems to be coming from the http package.
I'll just copy/paste what I said on IRC, and I'll attach a test tcl file to reproduce the problem :

<KaKaRoTo> it basically simulates someone who tries to do 100 geturls asynchronously, and they're all being done from a timer where the geturl gets called after a simple execution stack of maybe 5 functions...
<KaKaRoTo> with msn I have to do a SOAP (http) request for every contact, so if you have 200 contacts, I'd do 200 geturls..
<KaKaRoTo> I definitely want the geturl to be async! that's why I do -timeout to the geturl, but doing so makes http do a vwait, which causes the above error
<KaKaRoTo> (ps launch the script with 'wish tst.tcl', if you just do tclsh, it will end because there's no mainloop)

<ijchain> <kbk> KaKaRoTo - Don't Do That. Use the -command option to http::geturl so that it doesn't block
<KaKaRoTo> kbk: I use -command AND -timeout
<KaKaRoTo> if -timeout is not specified then http will not create the socket with -async
<KaKaRoTo> I was expecting my -command to get called if a connection error happens, or if the timeout was reached and the http request was not completed
<KaKaRoTo> I don't understand why -timeout would make it async AND make it vwait...
<ijchain> <kbk> Looking at the code --
<ijchain> <kbk> the only 'vwait' is in 'http_wait':
<ijchain> <kbk> and the only call to http_wait is gated:
<ijchain> <kbk> if {! [info exists state(-command)]} {
<ijchain> <kbk> http_wait $token
<ijchain> <kbk> }
<KaKaRoTo> kbk: there's another one
<KaKaRoTo> there are two calls to 'wait' in the geturl function
<KaKaRoTo> # Wait for the connection to complete
<KaKaRoTo> if {$state(-timeout) > 0} {
<KaKaRoTo> fileevent $s writable [list http::Connect $token]
<KaKaRoTo> http::wait $token
<ijchain> <kbk> ugh - I'm looking at the wrong http.tcl, sec...
<KaKaRoTo> kbk: np
<KaKaRoTo> we use http 2.4.4, because we had problems with earlier versions, so people upgrade their system, amsn stops working etc.. so we bundle 2.4.4 and do a package require -exact 2.4.4
<KaKaRoTo> but I checked the version in CVS and it has the same code
<ijchain> <kbk> KaKaRoTo - Log a bug. This can't be right.

So, as explained, in my opinion, the http::geturl should use -async on the socket if -command is specified, it should also call the callback command in case of a connection error instead of vwait-ing in geturl to return an error.

I hope this gets fixed soon... aMSN is basically completely broken for everyone because of this.



1 2 > >> (Page 1 of 2)
  • Alaoui Youness
    Alaoui Youness

    test code to reproduce bug

  • Alaoui Youness
    Alaoui Youness

    I fixed this bug in the latest http 2.7.5 (from core-8-5 branch in CVS).
    You can get the code here : http://amsn.svn.sf.net/viewvc/amsn/trunk/amsn/utils/http/http.tcl?revision=11885
    You can see my changes if you diff it against CVS : http://tcl.cvs.sourceforge.net/viewvc/\*checkout*/tcl/tcl/library/http/http.tcl?revision=

    In theory, no more vwait is used anymore when you do the call to geturl with
    a -command option.
    However, the test code previously attached to this bug is still having the same issue, but I have no idea why unfortunately... :(
    I hope this gets merged upstream.

  • Alaoui Youness
    Alaoui Youness

    Alright, I found the bug and fixed it in SVN revision 11889 : http://amsn.svn.sf.net/viewvc/amsn/trunk/amsn/utils/http/http.tcl?revision=11885
    The reason why my fix didn't work is because I had a typo :
    if {[info exists $state(-command)] }
    instead of
    if {[info exists state(-command)] }

    p.s: I just realized I have some whitespace issues in my code, maybe it's because I did a 'untabify' on the tab, so you may want to do a 'ignore whitespace changes' diff. Sorry.

  • Alaoui Youness
    Alaoui Youness

    patch to fix http 2.7.5

  • Alaoui Youness
    Alaoui Youness

    typo in my previous comment, the url should have had revision=11889... anyways, I did a diff (ignoring whitespace changes) and I added it to this bug.
    Thanks for checking it out!

    • priority: 5 --> 9
  • Hum, looks like this got completely drowned, never committed to trunk, right ?
    As of today, in both 8.[56], there is no way to do a fully async http request (including an async connect): the code insists on doing [socket -async] *only* in presence of -timeout, and -timeout forces a vwait.
    Why that nonsense ? (Youness's version gets it right)

  • We should consider what impact coroutines would have on this in 8.6

    • milestone: --> 2361739
  • kjnash

    I had problem with the same symptoms, so I'll add my comments to this tracker, though I am not sure if my fix will work for the original problem.

    My problem was with EOF detection when a server closes a keep-alive connection.

    If http::geturl is called with -keepalive 1 and a -command that enters the event loop (e.g for an authorization dialog), and the HTTP response includes a "Content-Length" header, then a (slow) race condition ensues between the server's closing of the connection, and the user's response to the dialog. If the connection is closed first, the unpatched code fails at line 1126 with repeated eof fileevents, because the command http::Eof tries repeatedly to run the user's -command, and never returns. This can be reported as "too many nested evaluations".

    While it might not be good style to allow the -command script to enter the event loop, http(n) does not exclude this possibility.

    There's more than one way to fix the bug. Two possible patches are attached. EIther will fix the bug; both fixes can be applied.

    Patches against core-8-5-branch checkin 69687a01db, dated 2012-10-24 11:28:33.

  • kjnash

    Sorry folks, I couldn't add the files without creating a new bug report, which is 3581754. The two patches are there.

  • Jan Nijtmans
    Jan Nijtmans

    • assigned_to: patthoyts --> dkf
  • Jan Nijtmans
    Jan Nijtmans

    kakaroto's patch committed to branch "bug-2911139" (based on core-8-5-branch). Looks good to me.

    Donal, do you agree?

  • Causes one test failure (http-4.14) but I don't know if that's significant yet. Or maybe it is even a good thing! I'll have to analyze further. Failure is as below (and is the only one):

    ==== http-4.14 http::Event FAILED
    ==== Contents of test case:

    set token [http::geturl $badurl/?timeout=10 -timeout 10000 -command \#]
    if {$token eq ""} {
    error "bogus return from http::geturl"
    http::wait $token
    http::status $token
    # error code varies among platforms.

    ---- Test completed normally; Return code was: 0
    ---- Return code should have been one of: 1
    ==== http-4.14 FAILED

  • Alaoui Youness
    Alaoui Youness

    Hey, dkf! Glad to see this bug taken care of! :)
    It looks like that test case expects an error to happen with the badurl, but since there's a -command, the socket gets created asynchronously, which means the error will happen in the command callback, not during the socket creation.
    So I think that this is a good thing indeed and working as expected. You'd just have to confirm that the error happens asynchronously.

  • Jan Nijtmans
    Jan Nijtmans

    I think that kakaroto is right, shouldn't the testcase just check whether the timeout occurred?

    Suggested test-case fix committed to the branch. Agreed? I'm OK with this
    change being merged to core-8-5-branch and up, but I'm not the expert here....

  • I'm still getting a failure, but now it's getting 'error' rather than 'timeout' as the result string.

    Generally, I think that fixing the test is (probably) the correct approach, as the test is looking for something that is intended to be changed. Once we've got the test working, merging through to the 8.5 branch, trunk and novem (though there's no hurry on novem) seems reasonable.

  • Alaoui Youness
    Alaoui Youness

    Unless I misunderstood the test, isn't error what you're expecting?
    If the bad url is a badly formatted URI, then you should expect an error right away.
    If the bad url is a valid URI but with an invalid/unreachable server, then you expect it to error out in the callback, and not for it to timeout. It would only timeout if the server was real and dropping packets, so you don't get a "host unreachable" ICMP or "connection refused" error. Otherwise, it will give you an error directly.

  • At the moment, I'm just reporting a test failure when I do 'make test' :-)

  • Jan Nijtmans
    Jan Nijtmans

    >If the bad url is a valid URI but with an invalid/unreachable server, then
    >you expect it to error out in the callback, and not for it to timeout
    Well, "timeout" is what I see. If that's wrong, maybe the error
    handling is still not 100% correct. I'm not sure yet.

  • Jan Nijtmans
    Jan Nijtmans

    It might also be that the test outcome is timing-dependant: What
    if the [http::wait $token] awakens twice, once when the connection
    is created (with invalid socket), once when the connection times
    out. Then it just depends on timing whether the last [http::status]
    returns "error" or "timeout". Shouldn't the error-situation cancel
    the time-out then?

  • I'm not sure; I'm out on a business trip this week so I don't know how clean my network configuration is. OTOH, if I do:

    make test TESTFLAGS="-file h*.test -match http-4.14"

    This runs just the relevant test, and it fails very quickly; it doesn't look like a timeout to me. If I add:

    puts [set ${token}(error)]

    To the test immediately before the [http::status $token], I get this output as well:

    {connect failed connection refused} {can't read "state(sock)": no such element in array
    while executing
    "fileevent $state(sock) writable {}"} NONE

    I have no idea at all how we're getting into that state!

  • Jan Nijtmans
    Jan Nijtmans

    >I have no idea at all how we're getting into that state!
    I think I have now. The
    can't read "state(sock)":...
    comes from line 271, just a left-over from the $errorInfo
    at the previous http::reset.

    But the "connect failed connection refused" is a
    better string to test for, so we know exactly what
    happend and not some random other error. Done
    now in the "bug-2911139" branch.

    I don't get the "timeout" any more, which I got at the
    office. At home this test-case passes now, both on
    win32 as on Linux. I'm satisfied, everything looks OK
    to me now. Agreed?

  • Works for me now. :-)

  • Committed to 8.5 branch; not yet looked into porting to trunk.

1 2 > >> (Page 1 of 2)