Tcl / Read-Only Bugs / #3987 exec with pipes and fileevents leaks ~2Mb everytime

Mark Gammon - 2008-05-06

priority: 5 --> 8
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2008-05-06

Logged In: YES
user_id=496139
Originator: NO

Just to be sure.. You're using TclX, right ?
(That's funny, I was precisely considering integrating TclX's standalone pipes into the core (TIP304))
Will try to reproduce; please give TclX version used.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Gravereaux - 2008-05-06

Logged In: YES
user_id=7549
Originator: NO

Mark, are you actually waiting waiting for EOF in the readable callback? [close] should happen there, not in the top after churning the event loop. No wonder you're hitting that situation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Gravereaux - 2008-05-06

Logged In: YES
user_id=7549
Originator: NO

proc callback {x out_r} {
set line[gets $out_r]
...
if {[eof $out_r]} {
close $out_r
set ::done 1
}
}

Do it that way and use [vwait $done] instead of [update] in the top

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2008-05-07

Logged In: YES
user_id=496139
Originator: NO

David, while I agree with you that [update] is dangerous here in that potentially it will only superficially skim the event queue, I don't understand the 2M leak yet. Closing from within a callback or not doesn't change much from the notifier's PoV...

Moreover, in this specific case, the EOF will never happen since the other side of the $out pipe stays open in the parent.
To fix that I'd suggest closing $out_w and $in_r immediately after the [exec &].

Continuing investigations...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mark Gammon - 2008-05-07

Logged In: YES
user_id=2080734
Originator: YES

David, my callback wasn't testing for EOF as a sign of the child's death and closing the pipe then. When I add that and change to a vwait on the death then the script hangs. The callback read chars from the child but never sees a EOF. From the debugger it looks like the Tcl event loop is spinning round and checking for events on the pipes, the pipe says its readable but the read thread is hung in a call to ReadFile.

I found this article from MS about how to do this child spawning with pipes in C++:

http://support.microsoft.com/kb/190351

Looking at this I think it explains why ReadFile is hanging. It says the parent process MUST close the unused child ends of the pipes after it has spawned the child. If not then the child can't close the pipe properly when it dies, and hence the pipe stays up and ReadFile gets hung on it. I added this to my script and bingo!, it all works perfectly now. Here's the current version of my script:

...........................................................
proc callback {out_r} {
global isdead
echo "read [string length [read $out_r]] chars" ; flush stdout
# MUST test for EOF condition and close pipe NOW
if {[eof $out_r]} {
echo "child has died!" ; flush stdout
close $out_r
set isdead 1
}
}

pipe in_r in_w
pipe out_r out_w

fconfigure $out_r -blocking 0 -buffering none

set pid [exec ls <@$in_r >@$out_w &]

# MUST close unused channels to child else child can't close pipe when it dies causing read to hang
close $in_r
close $out_w

fileevent $out_r readable "callback $out_r"

# wait for "ls" to die
vwait isdead

close $in_w

...........................................................

I've instrumented tclWinPipe to tell me how the pipe read/write threads are stopped. Previously is was using the leaky TerminateThread call on the child reader threads, but now the threads are stopping properly in response to the event signal. Great!

There is one thing in the MS article that I'm not doing. For some reason it duplicates the child read and write pipes and closes the original channels it got from CreatePipe. Not sure why its doing this. I tried reproducing this with the TclX "dup" command and after hacking it to allow me to duplicate a pipe (odd restriction?) I got my script doing exactly the same as the MS C++ code. However I found that my script worked just fine WITHOUT doing this extra duplication, so I've left it out.

Alexandre: thanks for your help. Just seen your last comment. Looks like you are on to the same solution, i.e. closing the unused pipes after the exec. I think I've confirmed that this does cure the problem. BTW, we are using TclX8.2.0.

I don't think I'm out of the woods yet though as when I replace the child process "ls.exe" with my real child process, then I'm still seeing the child reader threads killed by TerminateThread. My real child process does more than "ls" in that it sits and reads from stdin and sends responses down it's stdout and stderr. I'll see if I can pin down why this isn't behaving as cleanly as the simple "ls" example.

Thanks a lot to you both for your excellent help!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mark Gammon - 2008-05-07

Logged In: YES
user_id=2080734
Originator: YES

I took a closer look at how I'd implemented the above fix in my full app. and I must of missed something crucial as I've re-implemented it again and now it works perfectly. This looks like a good work-around to the TerminateThread leak so it maybe the tclWinPipe code is OK, it just needs more careful tcl scripting.
Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2008-05-07

Logged In: YES
user_id=496139
Originator: NO

Mark,

While I'm very happy for you since you can proceed, there's one thing I don't quite grasp here.
You mentioned instrumenting tclWinPipe, which to my knowledge implements the core Tcl "command pipelines", that is, the channel type of [open |cmd r+]. This type uses complicated machinery including ancillary threads and Windows Events to implement fileevents, because up to Win9x/Me it was the only way.

Now your offending script instead uses TclX's [pipe] command, which yields two channels of type "file", which in Windows encapsulates only the vanilla file handle of Win32. As it turns out, those vanilla handles became waitable with WinNT (Thanks to Jeff and David for digging out these historical facts), so now TclX's [pipe] are fileevent-ready, which was not the case with pre-NT Windows (of course unix has been OK all the time ;-).

Given this dichotomy, I don't understand where tclWinPipe, its extra threads, and their brutal termination, come into play... Any idea ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mark Gammon - 2008-05-07

Logged In: YES
user_id=2080734
Originator: YES

I think this is how it works with TclX pipes.
The "pipe" command calls the Windows CreatePipe to create the pipe and get the two handles to it.
It then passes these to Tcl_MakeFileChannel to get them wrapped up as a normal tcl channel.
This in turn calls the Windows GetFileType which tells it that the handles are actually pipes
and not plain file handles. After this they get treated like pipes created by the "open" command,
and get processed by the tclWinPipe code.
Not 100% sure about the above but that's what I think it is doing.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2008-05-07

Logged In: YES
user_id=496139
Originator: NO

Looking at the current source (TclX 8.4.0), I do see that GetFileType, but only by a specific set of TclX commands (approximate names): dup, bind, fstat, fsize, ftruncate, select, flock, funlock, cloexec.

So, at least in the source I have access to, I don't see TclX retrofitting its raw pipes into core Tcl's instrumented ones. (As additional evidence, the instrumenting threads are only created from within TclpCreateCommandChannel, which is really not able to wrap a simple preexisting handle).

This is getting interesting ! Investigation continued...

is called within the ChannelToHandle function, which does recognize pipes to produce a type identifier TCLX_WIN_PIPE, but this value is largely ignored by the only caller of interest here, TclXOSGetSelectFnum (

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Gravereaux - 2008-05-07

Logged In: YES
user_id=7549
Originator: NO

I have too much to say, not sure if I'll say it all in this one post.

First, why does [close] exhibit the problem at all with pipes? Easy, ReadFile() is not interruptible. The purpose of the reader thread and the ReadFile() call inside it is to act as the alert mechanism. But if pipe handles are waitable objects, then what's the purpose of that? Again, easy.. under Win9x, pipe handles are _not_ waitable objects and that mess is needed to make the alert mechanism. But the question really is, why was it forced to close before it was done? That direction is not supported and can't easily be without kill.

Second, what's wrong with [open], it does the same thing as [exec ... &] with the pipe creation?

Brain lapse.. I lost my train of thought.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Gravereaux - 2008-05-07

Logged In: YES
user_id=7549
Originator: NO

Also, of course the unused duped side should be closed. That's what is supposed to be done. Examples of doing it right are shown in the core as well as that silly nmakehlp.c in the /win directory.

Third, given that win9x support can be finally dropped and that logic mess with the reader thread with the non-interuptable ReadFile() can go away, anonymous pipes can use all the named pipe functions and behaviors and are full waitable objects.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Gravereaux - 2008-05-07

Logged In: YES
user_id=7549
Originator: NO

Mark, you have scripted what the core already does with [open]. If there isn't a special purpose I'd recommend plain [open "|ls" r+].

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mark Gammon - 2008-05-07

Logged In: YES
user_id=2080734
Originator: YES

David,
It's been a long time since we coded it up to use "exec" and pipes, but I think the reason why we couldn't use "open" was that we needed to read from both stdout AND stderr separately. Am I right in thinking that the r+ arg to "open" returns a bi-directional channel that when written goes to stdin of the child and when read it gets stdout from the child. There is no way to get access to the stderr output from the child?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Gravereaux - 2008-05-07

Logged In: YES
user_id=7549
Originator: NO

Gotcha. Yes, [open] mixes somewhat. At the end of the process, you can get stderr though, but readables are on stdout. You just can't intermix the reception of the two while the child is running. There's an example on the wiki: http://wiki.tcl.tk/3846 , see [gotpipereadable] for getting stderr.

There's also another technique using BLT's bgexec command.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2008-05-08

Logged In: YES
user_id=496139
Originator: NO

Mark, please don't go to bgexec right now, we're just on the verge of integrating [pipe] (well, ti will be [chan pipe] I guess) in the core ;-)

As to my prior interrogations, I have found the missing piece in the puzzle: Yes, Tclx creates raw pipes, but No, it doesn't leave them unadorned to the core ! Indeed, what I have just realized, is that Tcl_MakeFileChannel does introspect the type, and in the pipe case calls TclpCreateCommandChannel with an empty pid list ! This explains everything, including the resort to the bad TerminateThread when the reader thread is blocked in ReadFile etc...

David, do you agree with the following conclusion:
- TclX's [pipe] fully interoperates with the (now obsolete) machinery in tclWinPipe.c
- Hence it can be ported to the core (TIP304) with no change and no worries
- It does have the old TerminateThread issue.
- This could be solved in post-NT era by direct wait on handles
- But this last move is orthogonal to TIP 304 and can wait a bit.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2008-05-08

Logged In: YES
user_id=496139
Originator: NO

(Don't know if this will reach you, SF is awfully unresponsive for me)
I've sent this also to tcl-core.
---
I think I can now produce a global analysis of the pipe-leak issue.

(1) This is a core issue, Tclx is a red herring here, it is only one
of many ways of exhibiting the problem;

(2) A core-only way is:

# this leaks 2M per run in Windows
# assuming "block.tcl" contains just "after 5000"
set pr [open "|tclsh block.tcl" r];after 500;fconfigure $pr
-blocking 0;read $pr 1;close $pr

(3) The crux of the problem is the (now possiby obsolete)
PipeReaderThread being blocked on a ReadFile at [close]-time, implying
the use of the thermonuclear TerminateThread, which is known to have a
4k kernel leak in Windows.

(4) But in today's Windows (XPSP2), it leaks 2M, not 4k !!!

I don't know at which point in time the x500 amplification of the leak occured.
Notice that the situation producing it is rather unusual (it assumes a
long-running child which we close prematurely from the output side,
and only on Windows where those tools (like vmstat) are not common).

Now there are IMO two parallel paths to follow simultaneously:

(P1) investigate to understand the large leak

(P2) (longer term) modernize the core to use waitable handles
(assuming they are really !), thus avoiding the need of ancillary
threads.

I would love to hear comments on this. I feel a bit shaky on Windows
to undertake (P2); and possibly a person who is literate in Windows
patch history migh home in on (P1) much fasterthan I would...

Thanks in advance,

-Alex

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2008-05-08

Logged In: YES
user_id=496139
Originator: NO

Gosh, just noticed the MS documentation says that the leak is the initial stack of the thread !
This might explain the variability of the size, since default stack size is largely in the hand of the OS.
A quick workaround would be to minimize the stack size of PipeReaderThread, who doesn't need 2M...

I also tried DisconnectNamedPipe(), but the poor guy blocks until all pending IO complete, so back to square one :-(

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Gravereaux - 2008-05-08

Logged In: YES
user_id=7549
Originator: NO

Yes, the stack gets blasted, and it's gone for good, never to reused 'till a reboot

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Gravereaux - 2008-05-08

Logged In: YES
user_id=7549
Originator: NO

> I also tried DisconnectNamedPipe(), but the poor guy blocks
> until all pending IO complete, so back to square one :-(

CancelIO() ? That might do it. IIRR, it only cancels I/O that was made in the thread it was called from. :/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Ferrieux - 2008-05-29

Logged In: YES
user_id=496139
Originator: NO

Will be trying the first of a two-step strategy shortly:

(1) Since pipes are now waitable in all living Windows, replace the blocking ReadFile in PipeReaderThread with a WaitForMultipleObjects of both the pipe and the event object. This way, the thread will spend all its life in a state where it can be gracefully asked to leave by signalling the event, hence removing the need for TerminateThread.

(2) Overhaul the pipe event source so that the pipe's handle is directly monitored by the notifier's central WaitForMultipleObjects.
(For this one I would love to have David's help ;-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Gravereaux - 2008-05-30

Logged In: YES
user_id=7549
Originator: NO

re #2, you got it.

Years ago I committed a patch that added the wait flag to the main Tcl_WaitForEvent's WaitForMultiple.. (as well a check on the proper return code) so that thread could be used to service asynchronous procedure calls during the wait. Time for us to collect on that and make use of it ;)

I don't much free-time these days now that I'm a dad, but I'll try my best to assist with the heavy lifting.

In the meantime, it might be fun (or not) to read through my source for the IOCPSOCK extension <http://sf.net/projects/iocpsock/> that makes use of overlapped-I/O in a channel driver. The big meat is in the iocpsock_lolevel.c file. The notification method isn't event objects, though. That's the trick.. Event based will take some work.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

exec with pipes and fileevents leaks ~2Mb everytime

The Tool Command Language implementation

Group

Searches

Help

#3987 exec with pipes and fileevents leaks ~2Mb everytime

Discussion