From: Brent W. <we...@aj...> - 2000-09-01 06:22:25
|
This issue is more contentious, so instead of calling for a vote I'll summarize the approaches proposed so far and try to summarize the issues of each. Background: Tcl currently doubles the storage for a string when appending to it. This has the excellent property that the cost of appending to a string is almost O(n). In contrast, a naive algorithm that only allocates exactly enough space for the appended bytes has O(n-squared) cost. The problem: On most operating systems the amount of memory a process can allocate is limited to physical memory+swap space (i.e., backing store on disk), which could be far less than the size of the virtual address space depending on how the machine is configured and the OS. Therefore, when Tcl tries to append data to a string that is slightly over half the size of the *available memory* then the memory doubling algorithm will fail. It fails because the TclpAlloc procedure panics when it cannot allocate memory. This leads to a cases where a machine has, for example, 512 Meg of available memory but cannot grow a string past 256 Meg. Solution 1 - "get more memory" Without changing Tcl we can just document this behavior and suggest that users make sure they have twice the available swap space to cover their physical memory. That way you can grow a string that fits into physical memory before you start paging. Benefits: No change to Tcl Drawbacks: Users have to know how to configure their machine to be able to fully utilize memory. Solution 2 - "Hintermayer's string allocation patch" Change the memory growing algorithm for a string to: if {size < LARGE_STRING_SIZE} { newsize = 2 * size; } else { newsize = size + STRING_GROW_SIZE; } The values for LARGE_STRING_SIZE and STRING_GROW_SIZE are not yet specified. The original patch used 1 Meg and 1 K, Jeff has suggested 4 Meg and 4 K, and I personally think 4Meg and 4Meg (i.e., make the two constants equal) may be best. Benefits: This algorithm allows strings to grow within STRING_GROW_SIZE of the available memory size. Drawbacks: This algorithm can slow down the append operation of strings larger than LARGE_STRING_SIZE. Also, the best choice of for these constants is dependent on the configuration of the current machine. Aside: We could export an interface to set this parameters dynamically as opposed to compiling them into Tcl. Solultion 3 - "Adaptive allocation" (My descrption of this may not be accurate) Retain the doubling algorithm until an attempt to allocate memory fails. When that occurs, reduce the growth by halves until the allocation succeeds: extra = size; do { newsize = size + extra; newbuf = realloc(buf, newsize); extra = extra / 2; } until (newbuf != NULL || extra < size_needed); Benefits: This retains the memory doubling algorithm until the current machine cannot tolerate it, so it has good performance. It also lets you allocate essentially every byte of available memory. It doesn't require that we choose arbitrary constants that may not be suitable for all machines and configurations. Drawbacks: Memory allocation slows down as you approach the limits of available memory. Next steps: If I've gotten something wrong, please speak up. I don't have to ask for that! Our goal is to reach a decision without much further rehash of the algorithms and issues. Things that would be helpful are a patch that implements Solution 3, as well as more benchmarking to compare the approaches. -- Brent Welch <we...@aj...> http://www.ajubasolutions.com Scriptics has become Ajuba Solutions -- The TclCore mailing list is sponsored by Ajuba Solutions To unsubscribe: email tcl...@aj... with the word UNSUBSCRIBE as the subject. |