From: SourceForge.net <no...@so...> - 2010-03-04 08:53:34
|
Bugs item #2960042, was opened at 2010-02-27 09:24 Message generated for change (Comment added) made by ferrieux You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2960042&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 42. Memory Preservation Group: current: 8.5.8 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Stephen Huntley (blacksqr) Assigned to: Jeffrey Hobbs (hobbs) Summary: high water mark memory not shared for file I/O needs Initial Comment: When a string is converted to a list via the split command, the memory used by the list is never released, even if the list variable is unset, even if the conversion takes place in a procedure with no information returned. To see the leak, execute the following procedure: proc listleak {} { set line xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx for {set i 0} {$i < 2000000} {incr i} { append largeString $line\n } set largeList [split $largeString \n] unset largeString unset largeList return } =========================================================================== On my computer: % exec vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 18844 2134300 11572 433516 0 0 71 25 121 378 7 1 92 1 % listleak % exec vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 18844 1836432 11580 433516 0 0 71 25 121 378 7 1 92 1 A net loss of free memory of 297868 kB. The memory never comes back until the Tcl interpreter is exited. ActiveTcl 8.5.8 on Ubuntu Intrepid. ---------------------------------------------------------------------- >Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-04 09:53 Message: Update: Donal nearly convinced me that the "statistics" (rare/frequent case) regarding free-able blocks are in favour of trying a block-deallocation scheme. "Nearly", because the problem is to do it efficiently. Several issues: (1) since (as you're right to point out) the block-freeing should not wait for alloc failures (mainly because alien code like extensions could not benefit from reclaimable space), it should happen routinely and hence be reasonably fast. This implies keeping track of which block is freeable instead of doing a sweeping pass just in time. Keeping track implies, for each individual Tcl_Obj allocation/deallocation, knowing to which block the object belongs, which is not currently recorded, and would require extra space in the object (or a sweeping pass with address comparisons, which we've just rejected). (2) if we go towards object relocation as you suggest, things get even worse. Indeed, since objects are reached by single indirection (not double, like old MacOS's relocatable "handles"), moving any of them implies finding references to it; short of adding a complicated back-reference system, this in turns means a mark-and-sweep pass. And even if we wild magic gives us the back references lying in container objects or vars, we still have to worry about the refs which are in the current C stack (eg in [foreach]), not to move the rug under our feet. Tough. ---------------------------------------------------------------------- Comment By: Stephen Huntley (blacksqr) Date: 2010-03-04 09:30 Message: After a little googling, it's not clear if any other leading dynamic language does much more in terms of memory management. A stop-gap solution may be using a package like metakit for manipulating large data sets in tight memory circumstances. ---------------------------------------------------------------------- Comment By: Stephen Huntley (blacksqr) Date: 2010-03-04 07:52 Message: I'm glad the clouds have parted a bit on what is going on here. It is somewhat what I thought, but I'm still dismayed. I would argue that any memory management scheme must have some sort of allocation lifecycle built in, where unused memory is at some point detected, garbage-collected and freed. Otherwise your memory management method is operationally indistinguishable from a leak, as the present example shows. Memory management is always tricky, but simply punting on garbage collection "within the universe of Tcl_Objs" seems to me like a disaster in the making. The interpreter need not wait until a malloc fails before starting a decision process about freeing internal memory. One might for example set a maximum high water mark of around 10 percent of RAM, after which Tcl_Obj memory is immediately freed after use. Or one might design an object compactor which moves in-use Tcl_Objs from blocks of otherwise unused objects so the blocks can be freed. Obviously I'm not familiar enough with Tcl's internals to give a wise and informed opinion, but as a user who wants to make expanded use of Tcl in the coming computer world where data sets and demands for efficiency will grow dramatically, this behavior seems a significant impediment. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-03 22:27 Message: Still not reproduced, but I think I know what's happening. Notice that there are two fundamentally different kinds of allocations in the Tcl core: Tcl_Obj's and vanilla mallocs. Tcl_Objs are allocated by blocks (with large mallocs), but the block themselves are never free()ed; instead, freed Tcl_Objs are linked in a free list inside their hosting blocks. Vanilla mallocs are used for everything else, like strings or list storage. Now this "high water mark" handling of objects doesn't generate actual leaks _within_ the universe of Tcl_Objs, but it competes for available memory with the rest of vanilla mallocs. Hence, if A is a vanilla malloc and B a block of Tcl_Objs and A+B exceed total available heap space, then: (1) A and B cannot be simultaneously allocated (obvious) (2) alloc(A);free(A);alloc(B);free(B) works (3) alloc(B);free(B);alloc(A);free(A) doesn't, simply because free(B) is a no-op as far as the heap is concerned So here A is large_file, B is hi_water_mark, and I'd predict that with proper sizes you can even observe the failure skipping the first call to large_file. Hence, your new title is accurate in describing the situation, though we can generalize "I/O needs" to "vanilla mallocs". However, it is not really a bug; rather it lays in the gray zone of implementation trade-offs. Indeed, while it could be imagined to free whole blocks of free Tcl_Objs in certain conditions (like after a failure to malloc or realloc), in the general case there is no guarantee that a completely free block exists (a single non-free obj per block spoils the method). A typical scenario generating this pattern is when long-lived objects are created at any time, not just at the beginning. To summarize, I'd say that a switch from "high-water-mark" to "non-monotonous" method for Tcl_Obj block allocation could improve *some* cases like the one you've shown, but would leave an overwhelming number of similar cases out of reach. Hence I'm tempted to freeze this as Won't Fix. I'd welcome counter arguments, or different evaluations of the statistics. a big vanilla malloc (like the big string grown by the first call to [large_file]) can fit once in memory, then (after freeing it) leave space to an irreversible claim by a big block of Tcl_Objs ---------------------------------------------------------------------- Comment By: Stephen Huntley (blacksqr) Date: 2010-03-03 01:29 Message: My previous description of the problem as a memory leak was invalid. I've changed the title of the bug ticket to reflect that. Please put that concept aside. How much free memory is left on your computer after you run high_water_mark? To see what I'm seeing, it should be less than the size of the file created by large_file (about 246MB). Adjust the number of iterations in the for loop of high_water_mark until it runs to completion but leaves less free memory available than the size of the large file. Then run large_file. The interpreter should crash even though almost none of the available memory is being used. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-02 22:09 Message: I've tried various sizes, but never got this behavior. I'm calling alternatively large_file and high_water_mark in a loop, and depending on the values I get one of two outcomes: - the loop goes on forever, with [memory info] at a fixed point - Tcl_Panic in the 1st invocation of either function None of these is compatible with a leak. Tough. ---------------------------------------------------------------------- Comment By: Stephen Huntley (blacksqr) Date: 2010-03-02 20:28 Message: Sorry for the oversight, yes the "unable to alloc" message definitely appears in the terminal from which tclsh was started when the interpreter crashes. P.S. The crash doesn't come on the second invocation of high_water_mark, but on the second invocation of large_file, after the first invocation of high_water_mark. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-02 08:55 Message: Two things: (1) Please please please confirm (of infirm) that when you get a crash on 2nd invocation of hi-water-mark, you also get the "unable to alloc" message on stderr, meaning Tcl_Panic(). This is important, and you keep failing to mention it. (2) Of course, IF/WHEN I manage to get the same behavior (crash on 2nd invocation only), it will be definitely qualify as a bug. That has not happened yet. But I'm just after hard evidence, no need to go back to c.l.t seeking for support of a "minority theory". It will become the majority as soon as evidence comes. ---------------------------------------------------------------------- Comment By: Stephen Huntley (blacksqr) Date: 2010-03-02 06:16 Message: P.S. As I stated in the comments in the attached file, on my computer, the high_water_mark proc runs OK, then Tcl crashes on the second invocation of the large_file proc. If high_water_mark crashes your interpreter, reduce the number of iterations in the for loop until it completes successfully. Then increase the size of the file written in large_file if necessary, until it's bigger than the free memory left over after running high_water_mark. Then you should see the behavior I'm seeing. ---------------------------------------------------------------------- Comment By: Stephen Huntley (blacksqr) Date: 2010-03-02 06:08 Message: I'm still trying to learn about Tcl's expected behavior in these instances, so I'm not sure what you're asking me for, if anything. My primary naive concern is the difference between expected (by me) behavior and what I see. After the high_water_mark proc runs, the interpreter is reserving over two GB of memory for itself, almost completely unused as far as I can see. Yet when the large_file proc is called, which requires only a small subset of that memory, Tcl doesn't utilize its reserved memory and complete execution of the proc, instead it tries to allocate even more memory and crashes as a result. This makes no sense to me. This is the area where I'm trying to understand even if this is seen as a bug. To me it sure seems to be, since my bottom line is the program crashes even though there's ample unused memory to complete the given tasks. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-01 11:57 Message: Confirmed, this is a pure oom condition, in my case when Tcl_Panic is called the process's VM size is 3G. Note that the limit is not the physical RAM, but rather the maximum heap size, which is bounded by addressable VM, which is 3G under 32-bit Linux (there's 1G of kernel-reserved address space). Note also that the real memory consumer is [split] rather than [append], since here [split] doesn't take advantage of the repeated substring (no interning, as Donal told you on c.l.t), hence needs an individual malloc block (with its overhead) for each split-chunk. Your call for counter-evidence now ;-) ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-03-01 10:19 Message: OK, repro on HEAD after doubling the size of the "xxxx" in the hi-water-mark function. Note that you don't need the large-file func, nor do you get any chance to call anything twice: Tcl not only crashes, but calls Tcl_Panic, and just before the core dump you should see on stderr: unable to alloc ... bytes (with the actual value possibly depending on system setup). I will dive a bit further, but this new light seems to indicate it is a dup of one of the various bugs related to [append] when reaching the size limit. We admittedly don't fail gracefully in that case, and Tcl_Panic is often the best we can do. So I'd say this has nothing to do with a leak or hi-water-mark methods, but simply an atomic, expected Tcl_Panic in low-memory condition. If you have counter-evidence, like crashing only on the 2nd call (which I don't see), please provide it. ---------------------------------------------------------------------- Comment By: Stephen Huntley (blacksqr) Date: 2010-03-01 06:19 Message: My previous understanding of my problem was inadequate, my apologies. I have uploaded demo code which crashes my interpreter which, I hope, illustrates the problem more clearly. The issue seems to be that when the Tcl interpreter keeps a claim on some memory used for a completed procedure as "high water mark" memory, that memory is not released or shared when it is needed for a large file I/O job. Thus the interpreter crashes with a memory allocation error when it seems that there should be more than enough momory under Tcl's control to complete the job. Additional comments are in the uploaded file. It may be necessary to tweak some of the numbers in the script depending on the RAM available on the test machine. I reported my computer's RAM and usage levels in the file. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2010-02-28 23:06 Message: Stephen, to help your chase: don't forget the function's result, it may keep a reference to your locally generated data. Also scan [info globals] after procedure end. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-02-27 18:45 Message: Could just be high-water-mark memory management. To check if it is really a leak, see if repeating the operation many times over increases the leak (when I try - with shorter strings - I see no change in consumption after the first run). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2960042&group_id=10894 |