OriginalBugID: 3409 Bug
Version: 8.2.1
SubmitDate: '1999-11-05'
LastModified: '2000-02-10'
Severity: CRIT
Status: Assigned
Submitter: techsupp
ChangedBy: hobbs
OS: Windows NT
OSVersion: 4
FixedDate: '2000-10-25'
ClosedDate: '2000-10-25'
Name:
Ulrich Lauther
CustomShell:
compiled a static lib; Tcl/Tk library scripts are embedded in a C-file and
ObservedBehavior:
I set up an event handler in a server gui-application which should react on
incoming messages on a socket, concurrently to gui handling. The code looks
similar to this:
int handle = accept(my_socket,0,0);
Tcl_Channel channel = Tcl_MakeTcpClientChannel((ClientData) handle);
Tcl_RegisterChannel(my_interpreter,channel);
Tcl_CreateChannelHandler(channel, TK_READABLE, file_callback, (void*)
this);
whenever input arrives at the socket, the function file_callback()
should be called.
This works fine under Linux, used to work in the past (< 8.1 ?) but does
not
reliably work under Windows. It DOES work, if I add printf()'s for each
arriving
message, so I suspect a timing problem.
DesiredBehavior:
See above.
http://www.deja.com/=dnc/viewthread.xp?AN=540260481
It's not certain that this is failing quite as mentioned,
and may be due to TCP delay.
-- 11/10/1999 hobbs
Another reports that this crept into Tcl between 8.0.4 and 8.2:
Michael Kirkham wrote:
...
> occuring in the older version of our software. This leads me to
> believe that the problem was introduced between Tcl 8.0.4 and 8.2.1.
>
> The dropped events problem is a little bit intermittent, so I am not
> 100% certain, but I have not myself been able to observe the dropped
> events problem with any version of our software built with Tcl 8.0.4
> but have quite often with versions built with Tcl 8.2.2 (including a
> version previously built with Tcl 8.0.4).
-- 12/30/1999 hobbs
I can't see quite why this is failing. Is it possible to try using Tcl's own code (Tcl_OpenTcpServer) to handle sockets instead? That runs code that should work on all supported platforms...
Logged In: YES
user_id=178287
I've recently been investigating this problem again hoping
to find a fix for the problem and in doing so have found
the following information that may be useful:
* It happens most often on Windows 95/98. It seems to be
much more difficult to reproduce on Windows NT, according
to my colleague with an NT machine, but I can reproduce it
with no problem on 95/98.
* If the socket is closed and reopened, with a new event
handler created, communication resumes for a time (in my
case this is UDP so there's no connection loss doing so).
* Simply using Tcl_DeleteChannelHandler and then calling
Tcl_CreateChannelHandler again does NOT cause communication
to resume.
* After an event is dropped, the socket remains in the
readable state (as indicated by a select() call directly on
the socket). It seems as long as the socket remains in
this state no further "readable" events will occur. Once
the socket is read so that it is no longer in the readable,
however, new incoming data will usually trigger the event
handler and will continue until the next time an event is
dropped.
* The behavior seems to be affected by whether or not the
application has focus. In my case, I've got a small DLL
written specifically to reproduce the problem. Two copies
of wish are run, loading the DLL in each, and one acts as a
client and the other a server sending UDP packets back and
forth between each other as fast as possible. The instance
acting as the server, which sends a packet back only when a
packet is received, usually runs normally as long as it has
the focus. But as soon as the client gets focus then
events start getting dropped left and right on Win 95/98.
* Tcl 8.0.5 does not exhibit the problem but all versions
from 8.2.1 through 8.4a2 do. 8.1.x - 8.2 may also exhibit
the problem but as of yet I haven't confirmed due to a
crashing problem with 8.1 and this DLL.
Logged In: YES
user_id=178287
I believe I finally managed to track down this elusive bug.
Background: Tcl uses WSAAsyncSelect() (winsock API) to tell
winsock to call a particular function (SocketProc()) when
an event such as incoming data occurs. This function
basically checks what sort of event occurred and sets some
flags to be checked later in Tcl's idle/event handling loop.
Eventually this loop calls another function (SocketEventProc
()) that verifies that the condition for the event (well,
FD_READ events, at least) is still met before signalling
back to Tcl (via Tcl_NotifyChannel()) to trigger the
function we specified (via Tcl_CreateChannelHandler()) as
our own handler for the event.
Now, in various places the Tcl socket drivers disable the
WSAAsyncSelect() handler before doing some things and re-
enabling it. (Re-enabling also causes any existing
conditions to re-generate events -- ie., the handler
specified to WSAAsyncSelect() is called). Before this
verification step mentioned above (involving a call to the
regular non-event-driven select() function) is one such
place where the WSAAsyncSelect() handler is disabled
temporarily.
However, the code was such that it's only RE-enabled after
the select() call when the socket no longer has data to be
read (select() returns 0). If there's data on the socket
(because, perhaps, multiple packets came in very quickly,
before the channel handler could read the first and clear
the event), then the WSAAsyncSelect() handler is apparently
not re-enabled.
I suspect, though I haven't verified, that in 8.0.5 and
earlier the problem didn't occur for one of a few possible
reasons:
1. The particular codepath that left the event handler
disabled was basically never called (ie., at this point
there was never data left to be read and select() would
always return 0).
2. -Or- the WSAAsyncSelect() handler was being re-enabled
later on in the event handling loop but isn't in later
versions.
3. -Or- 8.0.5 and earlier called WSAAsyncSelect() directly
at this point, rather than calling SendMessage() to trigger
a later call to WSAAsyncSelect() to re-enable the handler,
which may cause a race condition of sorts. (Though in this
case, the patch below would probably just be lucky that it
works for me).
At any rate, here's the change that worked for me to fix
this problem (in "diff -c" format; will also upload to
patches section). The little program I wrote to recreate
and debug this problem screams along happily without any
apparent dropping of events. Just one line that's inside an
else {} block that apparently shouldn't be (the SendMessage
() call, which triggers a later call to WSAAsyncSelect(),
is moved so it is called regardless of the select() return
value):
<pre>
*** ./orig/tcl8.2.3/win/tclWinSock.c Sun Aug 1 15:09:29
1999
--- ./tcl8.2.3/win/tclWinSock.c Thu Mar 22 16:44:48 2001
***************
*** 853,862 ****
if ((*winSock.select)(0, &readFds, NULL, NULL,
&timeout) != 0) {
mask |= TCL_READABLE;
} else {
- SendMessage(tsdPtr->hwnd, SOCKET_SELECT,
- (WPARAM) SELECT, (LPARAM) infoPtr);
infoPtr->readyEvents &= ~(FD_READ);
}
}
if (events & (FD_WRITE | FD_CONNECT)) {
mask |= TCL_WRITABLE;
--- 853,862 ----
if ((*winSock.select)(0, &readFds, NULL, NULL,
&timeout) != 0) {
mask |= TCL_READABLE;
} else {
infoPtr->readyEvents &= ~(FD_READ);
}
+ SendMessage(tsdPtr->hwnd, SOCKET_SELECT,
+ (WPARAM) SELECT, (LPARAM) infoPtr);
}
if (events & (FD_WRITE | FD_CONNECT)) {
mask |= TCL_WRITABLE;
</pre>
Logged In: YES
user_id=72656
See fix in patch 410674.