From: John O. <ou...@aj...> - 2000-08-30 15:38:26
|
This proposed change makes me nervous, because it results in O(N**2) behavior for a long series of appends. Suppose you add a large number of additional blocks of data, each 3*TCL_GROWTH_MIN_ALLOC bytes long, to an existing large string. Each append will cause the string to be reallocated and the entire existing contents to be copied. This should produce very slow behavior when the string is large. The reason for doubling the string size each time it's reallocated is that it guarantees that on average each byte is only copied about twice, no matter how large the string grows. I'm confused by the performance measurements; either there's something I don't understand, or the new approach should work badly in the case of large strings being appended to gradually. Also, I don't see why appends to short strings should improve in performance with the new algorithm; has something else changed in the new version besides what's described in your message? Lastly, I don't understand the comment about memory running out; virtually all machines where Tcl runs have virtual memory, no? This means you can exceed the physical memory size in your allocations. By the time you get near to exceeding the swapping space you have already started thrashing so badly that the system is unusable anyway. I think there may be a much better way to fix this problem. Why not take advantage of the Tcl object system? When appending to a string object, don't keep reallocating a single large buffer. Instead, create a chain of fixed-size buffers as a special internal representation. When the value is referenced as a string, then you can convert it into a single large buffer. This approach should be much faster in the common case where a string is appended to frequently without ever being referenced. It may be slightly slower if appends and references are intermixed, but I bet it won't be much slower even in this extreme case. -John- At 06:41 PM 8/29/2000 -0700, Eric Melski wrote: >(apologies for the verbosity of this message) > >Gerhard Hintermayer sent me a patch to adjust the way that the Tcl core >allocates memory for strings when growing them (via an append, for >example). Formerly, the growth algorithm was: > > newsize = 2 * (oldsize + appendsize) > >Now, the growth algorithm is: > > if (oldsize + appendsize >= TCL_GROWTH_LARGE_STRING) > newsize = oldsize + (2 * appendsize) + TCL_GROWTH_MIN_ALLOC > else > newsize = 2 * (oldsize + appendsize) > endif > >TCL_GROWTH_LARGE_STRING defaults to 1 megabyte; TCL_GROWTH_MIN_ALLOC >defaults to 1 kilobyte, and is included to efficiently handle repeated >small appends to a large string. > >The primary goal of this change was to increase the amount of memory that >Tcl could allocate. With the original algorithm, if you had a string that >was about 1/2 of available memory, you could not add to it anymore, >because it would try to allocate more than all of the available memory. >With the new algorithm, you can effectively allocate all of available >memory (depending on how you are growing the string -- once you get close >to the maximum, you will have to append progressively smaller amounts, but >you should be able to expand to fill all memory, if neccessary). > >However, a nice side benefit of the new algorithm is that it boosts core >performance, across the board. You can run the Tcl benchmark suite to see >all the improvements; I have included just the [append] benchmarks here as >these show the improvements most dramatically, I believe: > >000 VERSIONS: 1:8.4a2 2:8.4a1 >001 STR append 49 65 >002 STR append (1KB + 1KB) 42 45 >003 STR append (10KB + 1KB) 130 132 >004 STR append (1MB + 2b * 1000) 13163 21765 >005 STR append (1MB + 1KB) 7977 16320 >006 STR append (1MB + 1KB * 20) 8258 16497 >007 STR append (1MB + 1MB * 3) 33898 40935 >008 STR append (1MB + 1MB * 5) 41330 81164 >END VERSIONS: 1:8.4a2 2:8.4a1 > >Here, 8.4a2 includes the patch; 8.4a1 has the original behavior. Nice >work, Gerhard! > > Eric Melski The Other Tcl Guy > ericm at ajubasolutions.com Ajuba Solutions > >-- >The TclCore mailing list is sponsored by Ajuba Solutions >To unsubscribe: email tcl...@aj... with the > word UNSUBSCRIBE as the subject. ________________________________________________________________________ John Ousterhout 650-210-0102 tel Chairman and Chief Technology Officer 650-230-4070 fax Ajuba Solutions ou...@aj... http://www.ajubasolutions.com -- The TclCore mailing list is sponsored by Ajuba Solutions To unsubscribe: email tcl...@aj... with the word UNSUBSCRIBE as the subject. |