From: SourceForge.net <no...@so...> - 2011-06-27 22:23:13
|
Bugs item #3325339, was opened at 2011-06-23 22:19 Message generated for change (Comment added) made by ferrieux You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3325339&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 27. Channel Types Group: development: 8.6b1.1 Status: Open Resolution: None Priority: 9 Private: No Submitted By: Don Porter (dgp) Assigned to: Reinhard Max (rmax) Summary: socket-14.2 fails Initial Comment: I think this failure comes from the merge of the rmax-ipv6-branch to trunk. ==== socket-14.2 [socket -async] fileevent connection refused FAILED ==== Contents of test case: set client [socket -async localhost [randport]] fileevent $client writable {set x [fconfigure $client -error]} set after [after 1000 {set x timeout}] vwait x if {$x eq "timeout"} { append x ": [fconfigure $client -error]" } set x ---- Result was: timeout: connection refused ---- Result should have been (exact matching): connection refused ==== socket-14.2 FAILED ---------------------------------------------------------------------- >Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-06-28 00:23 Message: After a bit of explanations on the chat, I agree that this may be useful for the IPv6 transition, as more and more servers will get DNS records for both their v4 and v6 address. Now I have found the critical OS behavior that makes the bug appear or disappear: though a first select(writable) does end up calling TcpAsyncCallback (when the outcome of the asynchronous connection is known), at this spot the code calls getsockopt(SOL_SOCKET, SO_ERROR). On some OSes, like Fedora 12, it turns out that this has a side-effect: the OS no longer things the socket is writable when the outcome is "connection error". This can be seen very clearly through strace: the second select() doesn't wake up on the writable bit when the getsockopt is present, and does when it is not. By removing the resort to the cached getsockopt (->status field), everything works. Reinhard, can you explain why you added it ? ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-06-26 00:32 Message: [00:19] ferrieux c'mon, this is prio-9, a very recent change, and dgp said he would like to push 8.6b2 out ... [00:26] ferrieux anybody aware of the discussion regarding the ability for [connect -async] to try multiple addresses (from DNS) in sequence ? [00:26] * ferrieux can't find a TIP [00:27] ferrieux to me, sounds like the kind of black magic that doesn't fit in the core. The app will want to have a word to say on the order of the sequence, for example. [00:31] ferrieux okay, wrong timezone, but for the record: I am for a revert of 8eefe5a06f ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-06-26 00:15 Message: OK, quick dive :) As it turns out, one single select-callback can be hooked on a given fd at any given time (with Tcl_CreateFileHandler). As a result, the new TcpAsyncCallback and the generic TclChannelEventScriptInvoker (which supports user fileevents) are mutually exclusive. In other words, in no way can internal file handlers not based on TclChannelEventScriptInvoker coexist with user fileevents. As a consequence, there are two options: (1) use a scripted callback (a true fileevent) to the same effect (with possible introspection by the curious) (2) revert this change, which has not been backed by a TIP anyway ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-06-26 00:04 Message: Oh, my conclusion from strace was hasty: the fileevent doesn't fire. Here, source-diving is much more productive. The core of the bug is that with the new code, an internal writable fileevent is set up (TcpAsyncCallback), and when this one is called, it does Tcl_DeleteFileHandler(), which wipes out any user-provided fileevents. Clearly this is new and related to the new callback, made necessary for multiple-address iterations. I'll dive a bit more to grok how one is supposed to remove fileevents one by one. Just specifying the fd sounds too coarse. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-06-25 19:12 Message: FWIW, I do reproduce in 8.6, threaded and unthreaded, on my Fedora 12, but not in 8.5, threaded or unthreaded. Also, a bit of strace shows that in the failing case, the fileevent does fire, but doesn't unlock th e vwait. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2011-06-24 23:02 Message: Reinhard, can you provide an strace -f -tt -T of the failing 8.5 and 8.6 on your machine ? ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2011-06-24 20:03 Message: CentOS release 5.6 (Final) $ uname -a Linux localhost.localdomain 2.6.18-238.12.1.el5 #1 SMP Tue May 31 13:23:01 EDT 2011 i686 i686 i386 GNU/Linux ---------------------------------------------------------------------- Comment By: Reinhard Max (rmax) Date: 2011-06-23 23:11 Message: On what OS does socket-14.2 fail? The http-4.14 failure is almost certainly unrelated to the ipv6 merge, because it also fails for me on 8.5. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2011-06-23 22:30 Message: Also failing: ==== http-4.14 http::Event FAILED ==== Contents of test case: set token [http::geturl $badurl/?timeout=10 -timeout 10000 -command \#] if {$token eq ""} { error "bogus return from http::geturl" } http::wait $token http::status $token # error code varies among platforms. ---- Test completed normally; Return code was: 0 ---- Return code should have been one of: 1 ==== http-4.14 FAILED ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3325339&group_id=10894 |