From: SourceForge.net <no...@so...> - 2013-05-27 10:59:27
|
Bugs item #2911139, was opened at 2009-12-08 18:33 Message generated for change (Comment added) made by stwo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2911139&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 29. http Package Group: current: 8.5.14 Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Alaoui Youness (kakaroto) Assigned to: Jan Nijtmans (nijtmans) Summary: http::geturl abuses vwait on async call Initial Comment: Hi, As discussed on IRC with some of you, aMSN keeps crashing because of "too many nested evaluations", and it seems to be caused by a nested event loop by calling update/vwait/tkwait. I found out what the cause is and it seems to be coming from the http package. I'll just copy/paste what I said on IRC, and I'll attach a test tcl file to reproduce the problem : ========================================================== <KaKaRoTo> it basically simulates someone who tries to do 100 geturls asynchronously, and they're all being done from a timer where the geturl gets called after a simple execution stack of maybe 5 functions... <KaKaRoTo> with msn I have to do a SOAP (http) request for every contact, so if you have 200 contacts, I'd do 200 geturls.. <KaKaRoTo> I definitely want the geturl to be async! that's why I do -timeout to the geturl, but doing so makes http do a vwait, which causes the above error <KaKaRoTo> (ps launch the script with 'wish tst.tcl', if you just do tclsh, it will end because there's no mainloop) [...] <ijchain> <kbk> KaKaRoTo - Don't Do That. Use the -command option to http::geturl so that it doesn't block <KaKaRoTo> kbk: I use -command AND -timeout <KaKaRoTo> if -timeout is not specified then http will not create the socket with -async <KaKaRoTo> I was expecting my -command to get called if a connection error happens, or if the timeout was reached and the http request was not completed <KaKaRoTo> I don't understand why -timeout would make it async AND make it vwait... <ijchain> <kbk> Looking at the code -- <ijchain> <kbk> the only 'vwait' is in 'http_wait': <ijchain> <kbk> and the only call to http_wait is gated: <ijchain> <kbk> if {! [info exists state(-command)]} { <ijchain> <kbk> http_wait $token <ijchain> <kbk> } <KaKaRoTo> kbk: there's another one <KaKaRoTo> there are two calls to 'wait' in the geturl function <KaKaRoTo> # Wait for the connection to complete <KaKaRoTo> if {$state(-timeout) > 0} { <KaKaRoTo> fileevent $s writable [list http::Connect $token] <KaKaRoTo> http::wait $token <ijchain> <kbk> ugh - I'm looking at the wrong http.tcl, sec... <KaKaRoTo> kbk: np <KaKaRoTo> we use http 2.4.4, because we had problems with earlier versions, so people upgrade their system, amsn stops working etc.. so we bundle 2.4.4 and do a package require -exact 2.4.4 <KaKaRoTo> but I checked the version in CVS and it has the same code <ijchain> <kbk> KaKaRoTo - Log a bug. This can't be right. ========================================================== So, as explained, in my opinion, the http::geturl should use -async on the socket if -command is specified, it should also call the callback command in case of a connection error instead of vwait-ing in geturl to return an error. I hope this gets fixed soon... aMSN is basically completely broken for everyone because of this. Thanks, KaKaRoTo ---------------------------------------------------------------------- >Comment By: Stuart Cassoff (stwo) Date: 2013-05-27 03:59 Message: If 8.5.14 as released doesn't suffer (you'd say) then why do I get this error? Should I file a new bug? ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2013-05-27 03:02 Message: Many thanks, Alex, for taking care of this! I fully agree with your remarks. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2013-05-25 09:46 Message: That commit [a34d23182d] was identified as bogus and fixed by Jan himself the same day [1bbf0da818] , and then merged back into 8.5 mainline by dkf the day after: [5562d8bc15] . So, 8.5.14 as released doesn't suffer, I'd say ;) ---------------------------------------------------------------------- Comment By: Stuart Cassoff (stwo) Date: 2013-05-25 01:31 Message: 8.5.14/OpenBSD [a34d23182d] ==== http-4.14 http::Event FAILED ==== Contents of test case: set token [http::geturl $badurl/?timeout=10 -timeout 10000 -command \#] if {$token eq ""} { error "bogus return from http::geturl" } http::wait $token lindex [http::error $token] 0 ---- Test generated error; Return code was: 1 ---- Return code should have been one of: 0 2 ---- errorInfo: couldn't open socket: connection refused while executing "http::geturl $badurl/?timeout=10 -timeout 10000 -command \#" ("uplevel" body line 2) invoked from within "uplevel 1 $script" ---- errorCode: NONE ==== http-4.14 FAILED ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2013-01-23 07:13 Message: Thanks for committing to trunk, Jan; I got diverted by a meeting… ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2013-01-23 01:06 Message: Committed to 8.5 branch; not yet looked into porting to trunk. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2013-01-22 22:45 Message: Works for me now. :-) ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2013-01-22 15:01 Message: >I have no idea at all how we're getting into that state! I think I have now. The can't read "state(sock)":... comes from line 271, just a left-over from the $errorInfo at the previous http::reset. But the "connect failed connection refused" is a better string to test for, so we know exactly what happend and not some random other error. Done now in the "bug-2911139" branch. I don't get the "timeout" any more, which I got at the office. At home this test-case passes now, both on win32 as on Linux. I'm satisfied, everything looks OK to me now. Agreed? ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2013-01-22 09:19 Message: I'm not sure; I'm out on a business trip this week so I don't know how clean my network configuration is. OTOH, if I do: make test TESTFLAGS="-file h*.test -match http-4.14" This runs just the relevant test, and it fails very quickly; it doesn't look like a timeout to me. If I add: puts [set ${token}(error)] To the test immediately before the [http::status $token], I get this output as well: {connect failed connection refused} {can't read "state(sock)": no such element in array while executing "fileevent $state(sock) writable {}"} NONE I have no idea at all how we're getting into that state! ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2013-01-22 08:07 Message: It might also be that the test outcome is timing-dependant: What if the [http::wait $token] awakens twice, once when the connection is created (with invalid socket), once when the connection times out. Then it just depends on timing whether the last [http::status] returns "error" or "timeout". Shouldn't the error-situation cancel the time-out then? ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2013-01-22 08:00 Message: >If the bad url is a valid URI but with an invalid/unreachable server, then >you expect it to error out in the callback, and not for it to timeout Well, "timeout" is what I see. If that's wrong, maybe the error handling is still not 100% correct. I'm not sure yet. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2013-01-22 01:48 Message: At the moment, I'm just reporting a test failure when I do 'make test' :-) ---------------------------------------------------------------------- Comment By: Alaoui Youness (kakaroto) Date: 2013-01-22 01:07 Message: Unless I misunderstood the test, isn't error what you're expecting? If the bad url is a badly formatted URI, then you should expect an error right away. If the bad url is a valid URI but with an invalid/unreachable server, then you expect it to error out in the callback, and not for it to timeout. It would only timeout if the server was real and dropping packets, so you don't get a "host unreachable" ICMP or "connection refused" error. Otherwise, it will give you an error directly. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2013-01-22 00:42 Message: I'm still getting a failure, but now it's getting 'error' rather than 'timeout' as the result string. Generally, I think that fixing the test is (probably) the correct approach, as the test is looking for something that is intended to be changed. Once we've got the test working, merging through to the 8.5 branch, trunk and novem (though there's no hurry on novem) seems reasonable. ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2013-01-22 00:19 Message: I think that kakaroto is right, shouldn't the testcase just check whether the timeout occurred? Suggested test-case fix committed to the branch. Agreed? I'm OK with this change being merged to core-8-5-branch and up, but I'm not the expert here.... ---------------------------------------------------------------------- Comment By: Alaoui Youness (kakaroto) Date: 2013-01-21 17:02 Message: Hey, dkf! Glad to see this bug taken care of! :) It looks like that test case expects an error to happen with the badurl, but since there's a -command, the socket gets created asynchronously, which means the error will happen in the command callback, not during the socket creation. So I think that this is a good thing indeed and working as expected. You'd just have to confirm that the error happens asynchronously. Thanks! ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2013-01-19 14:24 Message: Causes one test failure (http-4.14) but I don't know if that's significant yet. Or maybe it is even a good thing! I'll have to analyze further. Failure is as below (and is the only one): ==== http-4.14 http::Event FAILED ==== Contents of test case: set token [http::geturl $badurl/?timeout=10 -timeout 10000 -command \#] if {$token eq ""} { error "bogus return from http::geturl" } http::wait $token http::status $token # error code varies among platforms. ---- Test completed normally; Return code was: 0 ---- Return code should have been one of: 1 ==== http-4.14 FAILED ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2013-01-17 05:29 Message: kakaroto's patch committed to branch "bug-2911139" (based on core-8-5-branch). Looks good to me. Donal, do you agree? ---------------------------------------------------------------------- Comment By: kjnash (kjnash) Date: 2012-10-29 20:05 Message: Sorry folks, I couldn't add the files without creating a new bug report, which is 3581754. The two patches are there. ---------------------------------------------------------------------- Comment By: kjnash (kjnash) Date: 2012-10-29 19:43 Message: I had problem with the same symptoms, so I'll add my comments to this tracker, though I am not sure if my fix will work for the original problem. My problem was with EOF detection when a server closes a keep-alive connection. If http::geturl is called with -keepalive 1 and a -command that enters the event loop (e.g for an authorization dialog), and the HTTP response includes a "Content-Length" header, then a (slow) race condition ensues between the server's closing of the connection, and the user's response to the dialog. If the connection is closed first, the unpatched code fails at line 1126 with repeated eof fileevents, because the command http::Eof tries repeatedly to run the user's -command, and never returns. This can be reported as "too many nested evaluations". While it might not be good style to allow the -command script to enter the event loop, http(n) does not exclude this possibility. There's more than one way to fix the bug. Two possible patches are attached. EIther will fix the bug; both fixes can be applied. Patches against core-8-5-branch checkin 69687a01db, dated 2012-10-24 11:28:33. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2012-05-26 13:03 Message: We should consider what impact coroutines would have on this in 8.6 ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-09-05 00:17 Message: Hum, looks like this got completely drowned, never committed to trunk, right ? As of today, in both 8.[56], there is no way to do a fully async http request (including an async connect): the code insists on doing [socket -async] *only* in presence of -timeout, and -timeout forces a vwait. Why that nonsense ? (Youness's version gets it right) ---------------------------------------------------------------------- Comment By: Alaoui Youness (kakaroto) Date: 2010-01-11 15:13 Message: typo in my previous comment, the url should have had revision=11889... anyways, I did a diff (ignoring whitespace changes) and I added it to this bug. Thanks for checking it out! ---------------------------------------------------------------------- Comment By: Alaoui Youness (kakaroto) Date: 2010-01-10 18:15 Message: Alright, I found the bug and fixed it in SVN revision 11889 : http://amsn.svn.sf.net/viewvc/amsn/trunk/amsn/utils/http/http.tcl?revision=11885 The reason why my fix didn't work is because I had a typo : if {[info exists $state(-command)] } instead of if {[info exists state(-command)] } p.s: I just realized I have some whitespace issues in my code, maybe it's because I did a 'untabify' on the tab, so you may want to do a 'ignore whitespace changes' diff. Sorry. ---------------------------------------------------------------------- Comment By: Alaoui Youness (kakaroto) Date: 2010-01-10 17:50 Message: Hi, I fixed this bug in the latest http 2.7.5 (from core-8-5 branch in CVS). You can get the code here : http://amsn.svn.sf.net/viewvc/amsn/trunk/amsn/utils/http/http.tcl?revision=11885 You can see my changes if you diff it against CVS 1.67.2.9 : http://tcl.cvs.sourceforge.net/viewvc/*checkout*/tcl/tcl/library/http/http.tcl?revision=1.67.2.9&pathrev=core-8-5-branch In theory, no more vwait is used anymore when you do the call to geturl with a -command option. However, the test code previously attached to this bug is still having the same issue, but I have no idea why unfortunately... :( I hope this gets merged upstream. Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2911139&group_id=10894 |