[Tcl-bugs] [ tcl-Bugs-2960042 ] high water mark memory not shared for file I/O needs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #2960042, was opened at 2010-02-27 09:24
Message generated for change (Comment added) made by ferrieux
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2960042&group_id=10894

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: 42. Memory Preservation
Group: current: 8.5.8
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Stephen Huntley (blacksqr)
Assigned to: Jeffrey Hobbs (hobbs)
Summary: high water mark memory not shared for file I/O needs

Initial Comment:
When a string is converted to a list via the split command, the memory used by the list is never released, even if the list variable is unset, even if the conversion takes place in a procedure with no information returned.

To see the leak, execute the following procedure:

proc listleak {} {
	set line xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
	for {set i 0} {$i < 2000000} {incr i} {
		append largeString $line\n
	}
	set largeList [split $largeString \n]
	unset largeString
	unset largeList
	return
}

===========================================================================

On my computer:

% exec vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0  18844 2134300  11572 433516    0    0    71    25  121  378  7  1 92  1
% listleak
% exec vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0  18844 1836432  11580 433516    0    0    71    25  121  378  7  1 92  1

A net loss of free memory of 297868 kB.  The memory never comes back until the Tcl interpreter is exited.

ActiveTcl 8.5.8 on Ubuntu Intrepid.

----------------------------------------------------------------------

>Comment By: Alexandre Ferrieux (ferrieux)
Date: 2010-03-04 09:53

Message:

Update: Donal nearly convinced me that the "statistics" (rare/frequent
case) regarding free-able blocks are in favour of trying a
block-deallocation scheme. "Nearly", because the problem is to do it
efficiently. Several issues:

(1) since (as you're right to point out) the block-freeing should not wait
for alloc failures (mainly because alien code like extensions could not
benefit from reclaimable space), it should happen routinely and hence be
reasonably fast. This implies keeping track of which block is freeable
instead of doing a sweeping pass just in time. Keeping track implies, for
each individual Tcl_Obj allocation/deallocation, knowing to which block the
object belongs, which is not currently recorded, and would require extra
space in the object (or a sweeping pass with address comparisons, which
we've just rejected).

(2) if we go towards object relocation as you suggest, things get even
worse. Indeed, since objects are reached by single indirection (not double,
like old MacOS's relocatable "handles"), moving any of them implies finding
references to it; short of adding a complicated back-reference system, this
in turns means a mark-and-sweep pass. And even if we wild magic gives us
the back references lying in container objects or vars, we still have to
worry about the refs which are in the current C stack (eg in [foreach]),
not to move the rug under our feet.

Tough.

----------------------------------------------------------------------

Comment By: Stephen Huntley (blacksqr)
Date: 2010-03-04 09:30

Message:
After a little googling, it's not clear if any other leading dynamic
language does much more in terms of memory management.  A stop-gap solution
may be using a package like metakit for manipulating large data sets in
tight memory circumstances.

----------------------------------------------------------------------

Comment By: Stephen Huntley (blacksqr)
Date: 2010-03-04 07:52

Message:
I'm glad the clouds have parted a bit on what is going on here.  It is
somewhat what I thought, but I'm still dismayed.

I would argue that any memory management scheme must have some sort of
allocation lifecycle built in, where unused memory is at some point
detected, garbage-collected and freed.  Otherwise your memory management
method is operationally indistinguishable from a leak, as the present
example shows.  Memory management is always tricky, but simply punting on
garbage collection "within the universe of Tcl_Objs" seems to me like a
disaster in the making.

The interpreter need not wait until a malloc fails before starting a
decision process about freeing internal memory.  One might for example set
a maximum high water mark of around 10 percent of RAM, after which Tcl_Obj
memory is immediately freed after use.  Or one might design an object
compactor which moves in-use Tcl_Objs from blocks of otherwise unused
objects so the blocks can be freed.  Obviously I'm not familiar enough with
Tcl's internals to give a wise and informed opinion, but as a user who
wants to make expanded use of Tcl in the coming computer world where data
sets and demands for efficiency will grow dramatically, this behavior seems
a significant impediment.

----------------------------------------------------------------------

Comment By: Alexandre Ferrieux (ferrieux)
Date: 2010-03-03 22:27

Message:
Still not reproduced, but I think I know what's happening. 

Notice  that there are two fundamentally different kinds of allocations in
the Tcl core: Tcl_Obj's and vanilla mallocs. Tcl_Objs are allocated by
blocks (with large mallocs), but the block themselves are never free()ed;
instead, freed Tcl_Objs are linked in a free list inside their hosting
blocks. Vanilla mallocs are used for everything else, like strings or list
storage. 

Now this "high water mark" handling of objects doesn't generate actual
leaks _within_ the universe of Tcl_Objs, but it competes for available
memory with the rest of vanilla mallocs. Hence, if A is a vanilla malloc
and B a block of Tcl_Objs and A+B exceed total available heap space, then:
  (1) A and B cannot be simultaneously allocated (obvious)
  (2) alloc(A);free(A);alloc(B);free(B) works
  (3) alloc(B);free(B);alloc(A);free(A) doesn't, simply because free(B) is
a no-op as far as the heap is concerned

So here A is large_file, B is hi_water_mark, and I'd predict that with
proper sizes you can even observe the failure skipping the first call to
large_file.

Hence, your new title is accurate in describing the situation, though we
can generalize "I/O needs" to "vanilla mallocs".  However, it is not really
a bug; rather it lays in the gray zone of implementation trade-offs.

Indeed, while it could be imagined to free whole blocks of free Tcl_Objs
in certain conditions (like after a failure to malloc or realloc), in the
general case there is no guarantee that a completely free block exists (a
single non-free obj per block spoils the method). A typical scenario
generating this pattern is when long-lived objects are created at any time,
not just at the beginning.

To summarize, I'd say that a switch from "high-water-mark" to
"non-monotonous" method for Tcl_Obj block allocation could improve *some*
cases like the one you've shown, but would leave an overwhelming number of
similar cases out of reach. 

Hence I'm tempted to freeze this as Won't Fix.
I'd welcome counter arguments, or different evaluations of the
statistics.

a big vanilla malloc (like the big string grown by the  first call to
[large_file]) can fit once in memory, then (after freeing it) leave space
to an irreversible claim by a big block of Tcl_Objs

----------------------------------------------------------------------

Comment By: Stephen Huntley (blacksqr)
Date: 2010-03-03 01:29

Message:
My previous description of the problem as a memory leak was invalid.  I've
changed the title of the bug ticket to reflect that.  Please put that
concept aside.

How much free memory is left on your computer after you run
high_water_mark?  To see what I'm seeing, it should be less than the size
of the file created by large_file (about 246MB).  Adjust the number of
iterations in the for loop of high_water_mark until it runs to completion
but leaves less free memory available than the size of the large file. 
Then run large_file.  The interpreter should crash even though almost none
of the available memory is being used.

----------------------------------------------------------------------

Comment By: Alexandre Ferrieux (ferrieux)
Date: 2010-03-02 22:09

Message:
I've tried various sizes, but never got this behavior. I'm calling
alternatively large_file and high_water_mark in a loop, and depending on
the values I get one of two outcomes:
          - the loop goes on forever, with [memory info] at a fixed point
          - Tcl_Panic in the 1st invocation of either function
None of these is compatible with a leak. Tough.

----------------------------------------------------------------------

Comment By: Stephen Huntley (blacksqr)
Date: 2010-03-02 20:28

Message:
Sorry for the oversight, yes the "unable to alloc" message definitely
appears in the terminal from which tclsh was started when the interpreter
crashes.

P.S.  The crash doesn't come on the second invocation of high_water_mark,
but on the second invocation of large_file, after the first invocation of
high_water_mark.

----------------------------------------------------------------------

Comment By: Alexandre Ferrieux (ferrieux)
Date: 2010-03-02 08:55

Message:
Two things:

(1) Please please please confirm (of infirm) that when you get a crash on
2nd invocation of hi-water-mark, you also get the "unable to alloc" message
on stderr, meaning Tcl_Panic(). This is important, and you keep failing to
mention it.

(2) Of course, IF/WHEN I manage to get the same behavior (crash on 2nd
invocation only), it will be definitely qualify as a bug. That has not
happened yet. But I'm just after hard evidence, no need to go back to c.l.t
seeking for support of a "minority theory". It will become the majority as
soon as evidence comes.

----------------------------------------------------------------------

Comment By: Stephen Huntley (blacksqr)
Date: 2010-03-02 06:16

Message:
P.S.  As I stated in the comments in the attached file, on my computer, the
high_water_mark proc runs OK, then Tcl crashes on the second invocation of
the large_file proc.  If high_water_mark crashes your interpreter, reduce
the number of iterations in the for loop until it completes successfully. 
Then increase the size of the file written in large_file if necessary,
until it's bigger than the free memory left over after running
high_water_mark.  Then you should see the behavior I'm seeing.

----------------------------------------------------------------------

Comment By: Stephen Huntley (blacksqr)
Date: 2010-03-02 06:08

Message:
I'm still trying to learn about Tcl's expected behavior in these instances,
so I'm not sure what you're asking me for, if anything.  My primary naive
concern is the difference between expected (by me) behavior and what I see.
 After the high_water_mark proc runs, the interpreter is reserving over two
GB of memory for itself, almost completely unused as far as I can see.  Yet
when the large_file proc is called, which requires only a small subset of
that memory, Tcl doesn't utilize its reserved memory and complete execution
of the proc, instead it tries to allocate even more memory and crashes as a
result.  

This makes no sense to me.  This is the area where I'm trying to
understand even if this is seen as a bug.  To me it sure seems to be, since
my bottom line is the program crashes even though there's ample unused
memory to complete the given tasks.

----------------------------------------------------------------------

Comment By: Alexandre Ferrieux (ferrieux)
Date: 2010-03-01 11:57

Message:
Confirmed, this is a pure oom condition, in my case when Tcl_Panic is
called the process's VM size is 3G. Note that the limit is not the physical
RAM, but rather the maximum heap size, which is bounded by addressable VM,
which is 3G under 32-bit Linux (there's 1G of kernel-reserved address
space).

Note also that the real memory consumer is [split] rather than [append],
since here [split] doesn't take advantage of the repeated substring (no
interning, as Donal told you on c.l.t), hence needs an individual malloc
block (with its overhead) for each split-chunk.

Your call for counter-evidence now ;-)

----------------------------------------------------------------------

Comment By: Alexandre Ferrieux (ferrieux)
Date: 2010-03-01 10:19

Message:
OK, repro on HEAD after doubling the size of the "xxxx" in the
hi-water-mark function.
Note that you don't need the large-file func, nor do you get any chance to
call anything twice: Tcl not only crashes, but calls Tcl_Panic, and just
before the core dump you should see on stderr:

 unable to alloc ... bytes

(with the actual value possibly depending on system setup).

I will dive a bit further, but this new light seems to indicate it is a
dup of one of the various bugs related to [append] when reaching the size
limit. We admittedly don't  fail gracefully in that case, and Tcl_Panic is
often the best we can do.

So I'd say this has nothing to do with a leak or hi-water-mark methods,
but simply an atomic, expected Tcl_Panic in low-memory condition.

If you have counter-evidence, like crashing only on the 2nd call (which I
don't see), please provide it.

----------------------------------------------------------------------

Comment By: Stephen Huntley (blacksqr)
Date: 2010-03-01 06:19

Message:
My previous understanding of my problem was inadequate, my apologies.  I
have uploaded demo code which crashes my interpreter which, I hope,
illustrates the problem more clearly.  The issue seems to be that when the
Tcl interpreter keeps a claim on some memory used for a completed procedure
as "high water mark" memory, that memory is not released or shared when it
is needed for a large file I/O job.  Thus the interpreter crashes with a
memory allocation error when it seems that there should be more than enough
momory under Tcl's control to complete the job.

Additional comments are in the uploaded file.  It may be necessary to
tweak some of the numbers in the script depending on the RAM available on
the test machine.  I reported my computer's RAM and usage levels in the
file.

----------------------------------------------------------------------

Comment By: Alexandre Ferrieux (ferrieux)
Date: 2010-02-28 23:06

Message:
Stephen, to help your chase: don't forget the function's result, it may
keep a reference to your locally generated data. Also scan  [info globals]
after procedure end.

----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2010-02-27 18:45

Message:
Could just be high-water-mark memory management. To check if it is really a
leak, see if repeating the operation many times over increases the leak
(when I try - with shorter strings - I see no change in consumption after
the first run).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2960042&group_id=10894

[Tcl-bugs] [ tcl-Bugs-2960042 ] high water mark memory not shared for file I/O needs

The Tool Command Language implementation

[Tcl-bugs] [ tcl-Bugs-2960042 ] high water mark memory not shared for file I/O needs