From: SourceForge.net <no...@so...> - 2010-03-06 00:41:28
|
Bugs item #2936225, was opened at 2010-01-21 12:44 Message generated for change (Comment added) made by ferrieux You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2936225&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 24. Channel Commands Group: None Status: Open Resolution: None Priority: 8 Private: No Submitted By: Colin McCormack (coldstore) Assigned to: Alexandre Ferrieux (ferrieux) Summary: [chan copy] overruns slow receiver Initial Comment: If (for example) a file is fed to a socket using [chan copy], where input from the file is always available and output is only sometimes available, [chan copy] seems to buffer input in memory without regard for the disparity between input and output. The effect of this is that, for a very large file, all memory can be consumed in buffered content, and tcl can fail on memory allocation. Perhaps [chan copy] should have regard to the output file's buffer, and not seek to fill more than that buffer specifies. This won't solve the memory exhaustion problem, but might be a sound way to indicate expected performance of the output chan. ---------------------------------------------------------------------- >Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-06 01:41 Message: Attached the file for posterity. ---------------------------------------------------------------------- Comment By: Colin McCormack (coldstore) Date: 2010-03-06 01:26 Message: http://code.google.com/p/wub/source/browse/trunk/Wub/Chan.tcl and specifically the IChan object in r2290 of this object turned out to be used in the exhibition of this bug. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-04 17:35 Message: Yes, but if you do the little extra work needed for the in vitro test I'm requesting, I (and probably Andreas) will have excellent material to chew on, instead or reasoning in abstracto. Single-stepping through gdb beats RTFS... Until then, you're alone :} ---------------------------------------------------------------------- Comment By: Colin McCormack (coldstore) Date: 2010-03-04 17:07 Message: Ferrieux, now that the problem has been isolated to some code in a refchan, it should be easier to reason about the cause. If it transpires that the refchan itself is faulty, then we have demonstrated no-cause and can close the bug. Failing that, we can hunt it in the refchan core (by producing a case.) ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-04 16:15 Message: Now that we know this, can you try building an in vitro case with (a transportable version of) this wrapper on top of a simple fcopy reading from /dev/null, and writing to a blocked pipe ([chan pipe] with nobody reading the read side) ? ---------------------------------------------------------------------- Comment By: Colin McCormack (coldstore) Date: 2010-03-04 15:53 Message: The experiment in the wild (removing a refchan wrapper from the [fcopy] data path) stopped the allocation exhaustion. Which leaves three possibilities: (a) the particular refchan implementation in question accumulates data proportional to the data flowing through it, (b) the refchan does not signal congestion to its caller properly, (c) there is something wrong with using refchans for [fcopy] ---------------------------------------------------------------------- Comment By: Colin McCormack (coldstore) Date: 2010-03-04 01:54 Message: Ferrieux pointed out that a reflected chan was in the path. I've just removed it from Wub (using the raw underlying channel in its stead.) I expect this will eliminate the error, is consistent with Ferrieux's not having reproduced it, and will also narrow the search for underlying cause a bit. If this stops the error, then it's in my refchan, or in refchan itself, or both. Much easier than blindly trying to reduce an app to the minimal case. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-04 00:55 Message: While Colin explained on the chat that the script was huge, can you try to extract a smaller part reproducing ? In the backtrace at frame #16 I see ReflectOutput: this is an interesting suspect. So, to help build a small repro script, can you extract from whatever 3-meg script you have, the refchan that is stuck onto the channel to which fcopy is writing ? ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-03 11:57 Message: Please attach here a script exhibiting the behavior. ---------------------------------------------------------------------- Comment By: Andrew Shadoura () Date: 2010-03-03 10:40 Message: Hello. Here is the backtrace: http://shadoura.com/bt.log This backtrace was made after a big file (~1GiB) was requested from Wub. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-02 15:27 Message: Reopening for chromebel_ to provide additional evidence. ---------------------------------------------------------------------- Comment By: SourceForge Robot (sf-robot) Date: 2010-02-18 03:20 Message: This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). ---------------------------------------------------------------------- Comment By: Colin McCormack (coldstore) Date: 2010-02-04 00:58 Message: Waiting for reporter to provide more details - can't reproduce - putting this on the back-burner. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-01-21 14:53 Message: Just tried, no difference on my system :/ Can you trim down to a simple script, replacing the output socket by something else that's blocking (like my stdout | sleep 9999 example) ? ---------------------------------------------------------------------- Comment By: Colin McCormack (coldstore) Date: 2010-01-21 14:22 Message: One possibly important detail is that the output (socket) chan in the case we (think) we observe causing this is configured non-blocking, binary. Does this make any difference? ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-01-21 14:04 Message: Hum, cannot seem to repro (Linux Fedora 12, unthreaded, 8.6 HEAD). FWIW, [chan copy]'s mechanism is one of alternating readable and writable fileevents, precisely to avoid accumulating data as you describe. I've tried both sync and async fcopy, reading from a local file and outputting to a blocked stdout (tclsh script.tcl | sleep 9999), in both cases Tcl stops reading shortly after the beginning (typically after reading 68K of input). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2936225&group_id=10894 |