Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, 25 Apr 2003, John Max Skaller wrote:

> By best you mean 'fastest'. 

Fastest in the most common cases, to be precise.  With "reasonable" (for
suitably lose definitions of reasonable) space wastage.  With doubling you
never waste more than about 75% of the space, and rarely waste 50%.  This
is actually not bad, compared to other data structures, if you consider
the wasted space "infrastructure overhead".  Consider a binary tree, where
the data is stored only at the leaves. You will have about as many branch
nodes as you have leaf nodes.  Suddenly have the memory used in branch 
nodes are references to other branch nodes, not references to data- 
"wasted" space.

Increasing the muliple decreases the number of copies you need to make, at 
the cost of greatly increasing the "wasted space" on average.  Plus, 
powers of two are generally easier to allocate than powers of anything 
else.

> > Hmm.  Could be done with a reference.  But this is a very non-ML sort of 
> > concept.
> 
> I agree: loss of reentrancy .. BUT provided the store is atomic
> it only affects performance not semantics if the variable is
> changed at random even by multiple threads. Setting a strategy
> for a whole program depending on what phase of execution it is up
> to is fairly useful.

I was just thinking not-functional.  Not in that it wouldn't work, but in 
that it's not how most functional programming flows, if that makes sense.

> -------------------------------------------------------- 
> one thing I found: I needed 3 arguments not 2:
> 
> current slots, current used slots, required size

The inputs xarray has to give the resizer algorithm are:

- the current number of slots the array has

- the current number of slots the array needs

Note that if I'm increasing or decreasing the array size, I adjust the 
number of slots needed before passing it into the resizer.  What the 
resizer returns should be the number of slots the array should have.

Once the resizer calculates what size the array should be, the xarray code 
actually deals with allocating a new array and copying data over (if 
needed), etc.

Maybe these parameters should have a better name.

> Conceptually, the reason is: the ideal size
> may be influenced by the allocated size for example:
> 

Always.  You need to return the current size if you don't want to 
reallocate the array.

> 
> The extra argument is necessary for asymmetric growth/shrinkage
> algorithms, so I recommend you add it: at present you cannot
> take 'unused space' into account since it cannot be computed.
> 

Hmm.  Consider the following case.  You have an array, using the doubling 
algorithm, with a size of 16 and a length (number of used elements) of 10.  
Now, you add an enumeration with 100 elements in it.

Currently, the resizer function is called with a currsize of 16 (the 
current number of spaces in the array when it's called) and a length of 
110 (how many elements need to fit into the array).  Exponential_resizer 
then keeps doubling 16 until it gets a long enough array size- 128 in this 
case.  It returns 128.

If I understand you correctly, in this example you also want 10 (call it
'oldlength')  passed in as well.  Certainly not difficult.  I just don't 
see what you get out of it.  I suppose you could keep the same percentage 
of free space, and return 176 instead of just 128.

If resizer returns anything except the current array size, a new array is
allocated and the data copied over (hmm.  Must remember to null out the
old array after copying everything over).  Should having a small amount of 
data make the algorithm more inclined to copy data over?

Brian