Menu

#3506 socket name corruption

obsolete: 8.4.14
open
5
2007-09-21
2006-09-10
No

I was getting weird errors while experimenting with Pats
dns package using the code below. What it does is
essentially open a tcp socket async to the dns port (53),
gets some stuff, and invokes a callback while its socket is
still open. In the callback I open another socket async but
the writable handler is never invoked if I do dns::cleanup on
the dns token (which closes socket).
It turned out that the second socket created is identical to
the already open one!!!
The debug output looks essentially:

---> cb ::dns::1
...
::dns::1(sock) = sock12
...
socket=sock12

Two different open sockets with identical names!!!
This happens consistently on MacOS 10.2.8 Tcl 8.4.9
(sorry for not having a more recent one).
I have tested WinXP using 8.4.12 (I guess) and it happened once,
and then I couldn't reproduce it anymore.

Ideas? Mats

package require dns
proc cb {token} {
global s
puts "---> cb $token"
parray $token
set s [socket -async google.com 80]
fconfigure $s -blocking 0
fileevent $s writable writable
puts "\t socket=$s"
# Try with and without cleanup.
dns::cleanup $token
}

proc writable {args} {
global s
puts "+++> writable s=$s"
close $s
}

dns::resolve google.com -command cb -protocol tcp

Discussion

  • Mats Bengtsson

    Mats Bengtsson - 2006-09-10
    • priority: 5 --> 7
     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    Arguably, the udp package in use (tcludp?) is at fault for
    naming its sockets according to the pattern sock%d

    On the other hand, duplicating the name like that perhaps
    should lead to the other socket with that name being closed.

     
  • Donal K. Fellows

    • milestone: --> 450901
     
  • Mats Bengtsson

    Mats Bengtsson - 2006-09-11

    Logged In: YES
    user_id=108900

    But I'm using tcp;
    dns::resolve google.com -command cb -protocol tcp
    Your thought was also mine for a while.

    Mats

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    The Tcl core uses the socket's FD number to disambiguate
    channel names. Since those are unique while each socket is
    open (enforced by the OS) the underlying network package
    must be using something else. Since I don't know the code
    well, I don't know what's at fault. What package is the dns
    package using for its low-level socket management?

     
  • Pat Thoyts

    Pat Thoyts - 2006-09-12

    Logged In: YES
    user_id=202636

    I cannot see any reason why the tcludp package should
    produce name collisions as it creates socket names the same
    way that tcl does, namely appending the filedescriptor
    number to 'sock': sprintf(channelName, "sock%d",
    statePtr->sock);
    I cannot get this problem to occur on Windows XP using tcl
    8.4.13 or tcl8.5a5 with/without udp-1.0.8. I'm attaching the
    two test scripts (one just uses tcp sockets - try against a
    web server, the other uses udp TIME service and needs a host
    listening for TIME protocol on port 37 tcp and udp.)
    Perhaps MacOSX will fail with these?

     
  • Pat Thoyts

    Pat Thoyts - 2006-09-12

    udp test (port 37)

     
  • Pat Thoyts

    Pat Thoyts - 2006-09-12

    tcp only test (use with a web server port 80)

     
  • Mats Bengtsson

    Mats Bengtsson - 2006-09-12

    Logged In: YES
    user_id=108900

    The problem seems unrelated to the udp package since the test case is
    configured for tcp. The dns package then uses the standard tcp socket
    as:
    set s [socket $state(-nameserver) $state(-port)]
    fconfigure $s -blocking 0 -translation binary -buffering none

    Pat: using your bug1555698-tcp.tcl I just get a lot of stack traces (after
    changing ripon to $host in CreateSocket):
    [Mats-Bengtssons-dator:~/Desktop] matben% tclsh bug1555698-tcp.tcl
    google.com 80
    created sock5 49817
    can't get sockname: connection reset by peer
    while executing
    "fconfigure $s -sockname"

    On WinXP I could only see the problem once and then I never reproduced
    it.
    MacOSX is using the standard unix socket code so I don't believe there is
    anything special with that. Unless it is the system socket stack.

     
  • Mats Bengtsson

    Mats Bengtsson - 2006-09-12

    Logged In: YES
    user_id=108900

    Just got Daniels updated cvs to build on 10.2.8 and I can just
    confirm that the problem persists in 8.4.14
    (core-8-4-branch 20060912)

     
  • Mats Bengtsson

    Mats Bengtsson - 2006-09-12
    • milestone: 450901 --> obsolete: 8.4.14
     
  • Pat Thoyts

    Pat Thoyts - 2006-10-01
    • priority: 7 --> 9
     
  • Donal K. Fellows

    • priority: 9 --> 5