From: George H. <ga...@si...> - 2000-08-31 16:13:50
|
> This proposed change makes me nervous, because it results in > O(N**2) behavior for a long series of appends. Suppose you add > a large number of additional blocks of data, each 3*TCL_GROWTH_MIN_ALLOC > bytes long, to an existing large string. Each append will cause > the string to be reallocated and the entire existing contents to be > copied. This should produce very slow behavior when the string is > large. The reason for doubling the string size each time it's > reallocated is that it guarantees that on average each byte is only > copied about twice, no matter how large the string grows. > > I'm confused by the performance measurements; either there's something > I don't understand, or the new approach should work badly in the > case of large strings being appended to gradually. Also, I don't see > why appends to short strings should improve in performance with the > new algorithm; has something else changed in the new version besides > what's described in your message? > > Lastly, I don't understand the comment about memory running out; > virtually all machines where Tcl runs have virtual memory, no? This means > you can exceed the physical memory size in your allocations. By the time > you get near to exceeding the swapping space you have already started > thrashing so badly that the system is unusable anyway. > > I think there may be a much better way to fix this problem. Why not > take advantage of the Tcl object system? When appending to a string > object, don't keep reallocating a single large buffer. Instead, create > a chain of fixed-size buffers as a special internal representation. > When the value is referenced as a string, then you can convert it into > a single large buffer. This approach should be much faster in the > common case where a string is appended to frequently without ever > being referenced. It may be slightly slower if appends and references > are intermixed, but I bet it won't be much slower even in this extreme > case. This is hopefully the kind of problem/solution that should benefit from having more eyes to examine it. It would help to see the patch, but I agree with John here. Slowing the allocation rate just makes amortized cost of each insert more expensive. You still have to copy the data from the old memory into the new. And with smaller allocations, you will simply do that more often. I'd really like to see the original problem. I looked at the thread and I couldn't find it. I want to know what problem the Tcl code (not C code) is attempting to solve and can't (because of the current allocation scheme). Where and why is he allocating a string so large that it exceeds his virtual memory? Can this string be broken into multiple parts? It would help to see the benchmarks too. I don't know what they are really measuring. I'm suspicious of the different versions too. I wonder if you are fixing a theoretical case at the expense of real performance. In practice, you can't run near the virtual memory limit because of the places in the code where is no check is made for memory exhaustion. There may still be good things in this patch (outside slowing the growth rate). But I can't tell from either the description of the algorithm or by the timing numbers. --gah -- The TclCore mailing list is sponsored by Ajuba Solutions To unsubscribe: email tcl...@aj... with the word UNSUBSCRIBE as the subject. |