[TCLCORE] Re: Hintermayer's string allocation patch

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

 > This proposed change makes me nervous, because it results in
 > O(N**2) behavior for a long series of appends.  Suppose you add
 > a large number of additional blocks of data, each 3*TCL_GROWTH_MIN_ALLOC
 > bytes long, to an existing large string.  Each append will cause
 > the string to be reallocated and the entire existing contents to be
 > copied.  This should produce very slow behavior when the string is
 > large.  The reason for doubling the string size each time it's
 > reallocated is that it guarantees that on average each byte is only
 > copied about twice, no matter how large the string grows.
 > 
 > I'm confused by the performance measurements; either there's something
 > I don't understand, or the new approach should work badly in the
 > case of large strings being appended to gradually.  Also, I don't see
 > why appends to short strings should improve in performance with the
 > new algorithm; has something else changed in the new version besides
 > what's described in your message?
 > 
 > Lastly, I don't understand the comment about memory running out;
 > virtually all machines where Tcl runs have virtual memory, no?  This means
 > you can exceed the physical memory size in your allocations.  By the time
 > you get near to exceeding the swapping space you have already started
 > thrashing so badly that the system is unusable anyway.
 > 
 > I think there may be a much better way to fix this problem.  Why not
 > take advantage of the Tcl object system?  When appending to a string
 > object, don't keep reallocating a single large buffer.  Instead, create
 > a chain of fixed-size buffers as a special internal representation.
 > When the value is referenced as a string, then you can convert it into
 > a single large buffer.  This approach should be much faster in the
 > common case where a string is appended to frequently without ever
 > being referenced.  It may be slightly slower if appends and references
 > are intermixed, but I bet it won't be much slower even in this extreme
 > case.

This is hopefully the kind of problem/solution that should benefit
from having more eyes to examine it.   

It would help to see the patch, but I agree with John here.  Slowing
the allocation rate just makes amortized cost of each insert more
expensive.  You still have to copy the data from the old memory into
the new.  And with smaller allocations, you will simply do that more
often.

I'd really like to see the original problem.  I looked at the thread
and I couldn't find it.  I want to know what problem the Tcl code (not
C code) is attempting to solve and can't (because of the current
allocation scheme).  Where and why is he allocating a string so large
that it exceeds his virtual memory?  Can this string be broken into
multiple parts?

It would help to see the benchmarks too.  I don't know what they are
really measuring.  I'm suspicious of the different versions too.

I wonder if you are fixing a theoretical case at the expense of real
performance.  In practice, you can't run near the virtual memory limit
because of the places in the code where is no check is made for memory
exhaustion.  

There may still be good things in this patch (outside slowing the
growth rate).  But I can't tell from either the description of the
algorithm or by the timing numbers.

--gah

--
The TclCore mailing list is sponsored by Ajuba Solutions
To unsubscribe:  email tcl...@aj... with the 
                 word UNSUBSCRIBE as the subject.

[TCLCORE] Re: Hintermayer's string allocation patch

The Tool Command Language implementation

[TCLCORE] Re: Hintermayer's string allocation patch