Hi,
As discussed on IRC with some of you, aMSN keeps crashing because of "too many nested evaluations", and it seems to be caused by a nested event loop by calling update/vwait/tkwait. I found out what the cause is and it seems to be coming from the http package.
I'll just copy/paste what I said on IRC, and I'll attach a test tcl file to reproduce the problem :
==========================================================
<KaKaRoTo> it basically simulates someone who tries to do 100 geturls asynchronously, and they're all being done from a timer where the geturl gets called after a simple execution stack of maybe 5 functions...
<KaKaRoTo> with msn I have to do a SOAP (http) request for every contact, so if you have 200 contacts, I'd do 200 geturls..
<KaKaRoTo> I definitely want the geturl to be async! that's why I do -timeout to the geturl, but doing so makes http do a vwait, which causes the above error
<KaKaRoTo> (ps launch the script with 'wish tst.tcl', if you just do tclsh, it will end because there's no mainloop)
[...]
<ijchain> <kbk> KaKaRoTo - Don't Do That. Use the -command option to http::geturl so that it doesn't block
<KaKaRoTo> kbk: I use -command AND -timeout
<KaKaRoTo> if -timeout is not specified then http will not create the socket with -async
<KaKaRoTo> I was expecting my -command to get called if a connection error happens, or if the timeout was reached and the http request was not completed
<KaKaRoTo> I don't understand why -timeout would make it async AND make it vwait...
<ijchain> <kbk> Looking at the code --
<ijchain> <kbk> the only 'vwait' is in 'http_wait':
<ijchain> <kbk> and the only call to http_wait is gated:
<ijchain> <kbk> if {! [info exists state(-command)]} {
<ijchain> <kbk> http_wait $token
<ijchain> <kbk> }
<KaKaRoTo> kbk: there's another one
<KaKaRoTo> there are two calls to 'wait' in the geturl function
<KaKaRoTo> # Wait for the connection to complete
<KaKaRoTo> if {$state(-timeout) > 0} {
<KaKaRoTo> fileevent $s writable [list http::Connect $token]
<KaKaRoTo> http::wait $token
<ijchain> <kbk> ugh - I'm looking at the wrong http.tcl, sec...
<KaKaRoTo> kbk: np
<KaKaRoTo> we use http 2.4.4, because we had problems with earlier versions, so people upgrade their system, amsn stops working etc.. so we bundle 2.4.4 and do a package require -exact 2.4.4
<KaKaRoTo> but I checked the version in CVS and it has the same code
<ijchain> <kbk> KaKaRoTo - Log a bug. This can't be right.
==========================================================
So, as explained, in my opinion, the http::geturl should use -async on the socket if -command is specified, it should also call the callback command in case of a connection error instead of vwait-ing in geturl to return an error.
I hope this gets fixed soon... aMSN is basically completely broken for everyone because of this.
Thanks,
KaKaRoTo
test code to reproduce bug
Hi,
I fixed this bug in the latest http 2.7.5 (from core-8-5 branch in CVS).
You can get the code here : http://amsn.svn.sf.net/viewvc/amsn/trunk/amsn/utils/http/http.tcl?revision=11885
You can see my changes if you diff it against CVS 1.67.2.9 : http://tcl.cvs.sourceforge.net/viewvc/\*checkout*/tcl/tcl/library/http/http.tcl?revision=1.67.2.9&pathrev=core-8-5-branch
In theory, no more vwait is used anymore when you do the call to geturl with
a -command option.
However, the test code previously attached to this bug is still having the same issue, but I have no idea why unfortunately... :(
I hope this gets merged upstream.
Thanks.
Alright, I found the bug and fixed it in SVN revision 11889 : http://amsn.svn.sf.net/viewvc/amsn/trunk/amsn/utils/http/http.tcl?revision=11885
The reason why my fix didn't work is because I had a typo :
if {[info exists $state(-command)] }
instead of
if {[info exists state(-command)] }
p.s: I just realized I have some whitespace issues in my code, maybe it's because I did a 'untabify' on the tab, so you may want to do a 'ignore whitespace changes' diff. Sorry.
patch to fix http 2.7.5
typo in my previous comment, the url should have had revision=11889... anyways, I did a diff (ignoring whitespace changes) and I added it to this bug.
Thanks for checking it out!
Hum, looks like this got completely drowned, never committed to trunk, right ?
As of today, in both 8.[56], there is no way to do a fully async http request (including an async connect): the code insists on doing [socket -async] *only* in presence of -timeout, and -timeout forces a vwait.
Why that nonsense ? (Youness's version gets it right)
We should consider what impact coroutines would have on this in 8.6
I had problem with the same symptoms, so I'll add my comments to this tracker, though I am not sure if my fix will work for the original problem.
My problem was with EOF detection when a server closes a keep-alive connection.
If http::geturl is called with -keepalive 1 and a -command that enters the event loop (e.g for an authorization dialog), and the HTTP response includes a "Content-Length" header, then a (slow) race condition ensues between the server's closing of the connection, and the user's response to the dialog. If the connection is closed first, the unpatched code fails at line 1126 with repeated eof fileevents, because the command http::Eof tries repeatedly to run the user's -command, and never returns. This can be reported as "too many nested evaluations".
While it might not be good style to allow the -command script to enter the event loop, http(n) does not exclude this possibility.
There's more than one way to fix the bug. Two possible patches are attached. EIther will fix the bug; both fixes can be applied.
Patches against core-8-5-branch checkin 69687a01db, dated 2012-10-24 11:28:33.
Sorry folks, I couldn't add the files without creating a new bug report, which is 3581754. The two patches are there.
kakaroto's patch committed to branch "bug-2911139" (based on core-8-5-branch). Looks good to me.
Donal, do you agree?
Causes one test failure (http-4.14) but I don't know if that's significant yet. Or maybe it is even a good thing! I'll have to analyze further. Failure is as below (and is the only one):
==== http-4.14 http::Event FAILED
==== Contents of test case:
set token [http::geturl $badurl/?timeout=10 -timeout 10000 -command \#]
if {$token eq ""} {
error "bogus return from http::geturl"
}
http::wait $token
http::status $token
# error code varies among platforms.
---- Test completed normally; Return code was: 0
---- Return code should have been one of: 1
==== http-4.14 FAILED
Hey, dkf! Glad to see this bug taken care of! :)
It looks like that test case expects an error to happen with the badurl, but since there's a -command, the socket gets created asynchronously, which means the error will happen in the command callback, not during the socket creation.
So I think that this is a good thing indeed and working as expected. You'd just have to confirm that the error happens asynchronously.
Thanks!
I think that kakaroto is right, shouldn't the testcase just check whether the timeout occurred?
Suggested test-case fix committed to the branch. Agreed? I'm OK with this
change being merged to core-8-5-branch and up, but I'm not the expert here....
I'm still getting a failure, but now it's getting 'error' rather than 'timeout' as the result string.
Generally, I think that fixing the test is (probably) the correct approach, as the test is looking for something that is intended to be changed. Once we've got the test working, merging through to the 8.5 branch, trunk and novem (though there's no hurry on novem) seems reasonable.
Unless I misunderstood the test, isn't error what you're expecting?
If the bad url is a badly formatted URI, then you should expect an error right away.
If the bad url is a valid URI but with an invalid/unreachable server, then you expect it to error out in the callback, and not for it to timeout. It would only timeout if the server was real and dropping packets, so you don't get a "host unreachable" ICMP or "connection refused" error. Otherwise, it will give you an error directly.
At the moment, I'm just reporting a test failure when I do 'make test' :-)
>If the bad url is a valid URI but with an invalid/unreachable server, then
>you expect it to error out in the callback, and not for it to timeout
Well, "timeout" is what I see. If that's wrong, maybe the error
handling is still not 100% correct. I'm not sure yet.
It might also be that the test outcome is timing-dependant: What
if the [http::wait $token] awakens twice, once when the connection
is created (with invalid socket), once when the connection times
out. Then it just depends on timing whether the last [http::status]
returns "error" or "timeout". Shouldn't the error-situation cancel
the time-out then?
I'm not sure; I'm out on a business trip this week so I don't know how clean my network configuration is. OTOH, if I do:
make test TESTFLAGS="-file h*.test -match http-4.14"
This runs just the relevant test, and it fails very quickly; it doesn't look like a timeout to me. If I add:
puts [set ${token}(error)]
To the test immediately before the [http::status $token], I get this output as well:
{connect failed connection refused} {can't read "state(sock)": no such element in array
while executing
"fileevent $state(sock) writable {}"} NONE
I have no idea at all how we're getting into that state!
>I have no idea at all how we're getting into that state!
I think I have now. The
can't read "state(sock)":...
comes from line 271, just a left-over from the $errorInfo
at the previous http::reset.
But the "connect failed connection refused" is a
better string to test for, so we know exactly what
happend and not some random other error. Done
now in the "bug-2911139" branch.
I don't get the "timeout" any more, which I got at the
office. At home this test-case passes now, both on
win32 as on Linux. I'm satisfied, everything looks OK
to me now. Agreed?
Works for me now. :-)
Committed to 8.5 branch; not yet looked into porting to trunk.