This may be more than one problem. For now, I am
filing it all here.
First, the code:
Server
----
proc bgerror {args} {
puts "bgerror args: $args"
puts "bgerror open channels [file channels]"
}
proc CopyDone {in out bytes {error ""}} {
if { $error != "" } {
puts "Error: $error"
}
close $in
close $out
puts "$bytes bytes, open channels [file channels]"
}
proc Cntr {sock args} {
set fl [open A_BIG_FILE r]
# close $sock
# fconfigure $sock -buffersize 10
fcopy $fl $sock -command [list CopyDone $fl $sock]
}
socket -server Cntr 8070
vwait forever
----
"Client"
----
set sk [socket localhost 8070]
fconfigure $sk -buffersize 10
set line [read $sk 5]
puts $line
close $sk
----
It's worth trying other methods of interrupting
comunication with the server. Programs such as telnet
and netcat work for this.
The easiest problem to observe can be observed under
Linux. Uncomment the 'close $sock' line in the server,
and connect to the server with the 'client'. I see
something like this, on multiple connects:
@ashland [~/tmp] $ strace -o foobar tclsh ./fcopy.tcl
bgerror args: {can not find channel named "sock4"}
bgerror open channels stdin file5 stdout stderr sock3
bgerror args: {can not find channel named "sock4"}
bgerror open channels stdin file5 stdout file6 stderr sock3
bgerror args: {can not find channel named "sock4"}
bgerror open channels stdin file5 stdout file6 stderr
file7 sock3
bgerror args: {can not find channel named "sock4"}
bgerror open channels file8 stdin file5 stdout file6
stderr file7 sock3
bgerror args: {can not find channel named "sock4"}
bgerror open channels file8 stdin file9 file5 stdout
file6 stderr file7 sock3
bgerror args: {can not find channel named "sock4"}
bgerror open channels file8 stdin file10 file9 file5
stdout file6 stderr file7 sock3
The strace shows that the fd's are not getting closed.
On windows (XP in my case), comment out the close
statement in the server, and run the 'client' multiple
times. In some cases, it calls the callback with an
error, but other times it does not. Over time, it can
be seen that the channels are leaking.
I don't know what the correct behavior should be in all
cases. The man page does say that:
You are not allowed to do other I/O operations
with inchan
or outchan during a background fcopy. If either
inchan or
outchan get closed while the copy is in
progress, the cur
rent copy is stopped and the command callback is
not made.
If inchan is closed, then all data already
queued for
outchan is written out.
It doesn't specify what happens other than that,
though. I suppose we could just declare that 'bad
things happen', and leave it at that, but that doesn't
feel very satisfactory to me. I expect Tcl to be more
robust than that.
Logged In: YES
user_id=240
That was stupid... the fcopy is indeed returning an error in
case #1, as dgp was kind enough to explain to point out.
The windows problem persists, though.
Logged In: YES
user_id=240
Data point: running the client script from another machine
doesn't get it to leak, or at least I wasn't able to produce
that result. It only seems to work reliably from localhost.
That only produces the leak occasionally, not on every run
of the client script.
Logged In: YES
user_id=240
Below data point is not correct. Albeit very rarely, I can
get it to exhibit the bug from a remote connection, although
it's much more common from localhost.
Logged In: YES
user_id=75003
I confirm this for my NT machine and not quite head of tcl.
What I find is that sometimes the callback is not called at all.
Not witohut an error, but not at all. Thus the channel is not
closed and leaks. I wonder if that is related to the various
reports about fcopy based servers hanging on systems. I.e.
without the callback protocols like ftp are not driven to
completion, causing a client to wait forever.
... Ok also for head in 8.4. branch.
Logged In: YES
user_id=75003
The problem was two-fold. In tclWinSock.c SocketEventProc
generated only readable events when a socket close was
detected. And in TcpWatchProc the OS was instructed to not
look for FD_CLOSE when asking for writable events. This has
been fixed. Committed to both head and core-8-4-branch.