Thread: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

Brought to you by: adubey, ncannasse

ocaml-lib-devel

[Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: Brian H. <bh...@sp...> - 2003-04-11 23:14:07

A first cut at resizeable arrays.  

Some comments:

- I haven't made the reallocation strategy explicit.  And I'm not sure I 
like the shrinking strategy.

- I stole an idea from Perl, in that negative subscripts index from the
end of the array- so get x (-1) always returns the last element of x, no
matter how long x is.

- No comments.  Interface is either that for Array, or hopefully 
"obvious".

Note that there are still a lot of functions to write:

- set, insert, and append should have _list, _array, and _xarray variants
which only realloc once

- remove_range which only reallocs once

- sub hasn't been written

- Didn't write sort functions yet

- map2?

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: Nicolas C. <war...@fr...> - 2003-04-14 01:12:17

> A first cut at resizeable arrays.

Thanks Brian, we definitly need it in the ExtLib !

> - I haven't made the reallocation strategy explicit.  And I'm not sure I
> like the shrinking strategy.

The question is "when do you use shrinking ?". It seems that you're using
shrinking each time an element is removed, which is I think not the best way
of doing it. Typically, the life of an array looks like :
- add elements
- remove some eventually
- do some work with it (iter)
- loop

The user most of the time does not care at all if it's array is taking a
little too much memory, but some users in some cases do.
So what I propose is the following :
- the current shrinking strategy is only modifying the "len" parameter
- we provide a "pack" primitive that is resizing the array to it's "real"
size (len)  - so users with memory issues can call "pack" after having
removed a lots of elements (the user could also use to_array but that will
require him to use two different data structures for the same purpose)

> - I stole an idea from Perl, in that negative subscripts index from the
> end of the array- so get x (-1) always returns the last element of x, no
> matter how long x is.

If (-1) make some sense, I'm not so sure about (-2) and (-3) :-)
So I would prefer an explicit get_last x that is perhaps less confusing to
code readers.

And , about the name, what about Vector ? It's far more familiar that Xarray
:-)

> - No comments.  Interface is either that for Array, or hopefully
> "obvious"
>
> Note that there are still a lot of functions to write:
>
> - set, insert, and append should have _list, _array, and _xarray variants
> which only realloc once

We cannot add all the of_* , to_* and append_* for each data structure, but
an iterator can be nice :

Xarray.appendi x (List.iter l)
Xarray.appendi x (Array.iter a)
...
The problem here is that we don't know the exact size of the structure
before being enable to make a copy.
Perhaps it would be a nice start then to define some kind of data structure
generics :

type 'a iterator
val count : 'a iterator -> int (* remaining elements counter, O(1) *)
val next : 'a iterator -> 'a (* raise No_more_elements , complexity
unspecified *)
val has_more : 'a iterator -> bool (* O(1) *)
val iter : ('a -> unit) -> 'a iterator -> unit (* complexity unspecified *)
val map : ('a -> 'b) -> 'a iterator -> 'b iterator (* this is a lazy map,  I
like it ! *)

Then each data structure ( list, array, xarray, reflist ... ) will have
to_iterator, of_iterator and append_iterator (not sure about the names)
functions. For implementation, we can have :

type 'a iterator = {
    mutable count : int;
    next : (unit -> 'a);
}

Which is very simple but the data structure which is proving the "next"
function has to store it state itself.

we could define :

type ('a,'b) iterator = {
    mutable count : int;
    mutable state : 'b;
    next : ('b -> 'a * 'b);
}

but that will make the type information showing a little more about
implementation, which is meaningless to the end-user.

of maybe something like a functionnal continuation :

type 'a fnext =
    | FEnd
    | FIter of (unit -> 'a) * 'a fnext

type 'a iterator = {
    mutable count : int;
    mutable next : fnext;
}

and then :

let iter it =
    match it.next with
    | FEnd -> raise No_more_elements
    | FIter fnext ->
        let r,f = fnext() in
        it.count <- it.count - 1;
        it.next <- f;
        r

But perhaps the cost of matching and returned tuple allocation does not
worth it.
BTW Brian, if you're willing to create a SF account, I can give you rights
on the CVS.

Nicolas Cannasse

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: Blair Z. <bl...@or...> - 2003-04-14 02:04:37

Nicolas Cannasse wrote:
> 
> If (-1) make some sense, I'm not so sure about (-2) and (-3) :-)
> So I would prefer an explicit get_last x that is perhaps less confusing to
> code readers.

I have no problem with the -1, -2, -3, etc usage and find is very
understandable.  I haven't heard any complaints from Perl users on
this usage.

Best,
Blair

-- 
Blair Zajac <bl...@or...>
Plots of your system's performance - http://www.orcaware.com/orca/

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: Hideo B. <ba...@im...> - 2003-04-14 04:10:54

Hello list,

At Sun, 13 Apr 2003 19:04:17 -0700,
Blair Zajac wrote:
> 
> Nicolas Cannasse wrote:
> > 
> > If (-1) make some sense, I'm not so sure about (-2) and (-3) :-)
> > So I would prefer an explicit get_last x that is perhaps less confusing to
> > code readers.
> 
> I have no problem with the -1, -2, -3, etc usage and find is very
> understandable.  I haven't heard any complaints from Perl users on
> this usage.

I can imagine why Perl users wouldn't complain about this, but as
an OCaml user, I think I will :-). I can see that there would be no
problem if the index is written explicitly as -1, -2, -3, etc.
However, I don't like the idea because it essentially overrides
(to some extent) the very useful array bounds checking that OCaml
provides, and could `hide' bugs when variables are used as the index.

Maybe it's just me, but I often (er, at least sometimes?) find myself
writing code like:

        let index = x - y + i in
   	   do something with (Array.get array index)

where x, y and i move around in a complex way, and the problem
is that I sometimes forget to do +1 somewhere, and end up getting a
negative value as index (but is found easily since it raises
an exception!).

So I would very much like to have the function which doesn't allow
negative indices to be kept (preferably leaving the name as [get], so
as not to confuse users of standard arrays). If we must have a [get] with
negative indices, it should be a different function.

What do you think?

Regards,

--
Hideo Bannai <ba...@im...>

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: Blair Z. <bl...@or...> - 2003-04-14 04:18:55

Hideo Bannai wrote:
> 
> Hello list,
> 
> At Sun, 13 Apr 2003 19:04:17 -0700,
> Blair Zajac wrote:
> >
> > Nicolas Cannasse wrote:
> > >
> > > If (-1) make some sense, I'm not so sure about (-2) and (-3) :-)
> > > So I would prefer an explicit get_last x that is perhaps less confusing to
> > > code readers.
> >
> > I have no problem with the -1, -2, -3, etc usage and find is very
> > understandable.  I haven't heard any complaints from Perl users on
> > this usage.
> 
> I can imagine why Perl users wouldn't complain about this, but as
> an OCaml user, I think I will :-). I can see that there would be no
> problem if the index is written explicitly as -1, -2, -3, etc.
> However, I don't like the idea because it essentially overrides
> (to some extent) the very useful array bounds checking that OCaml
> provides, and could `hide' bugs when variables are used as the index.
> 
> Maybe it's just me, but I often (er, at least sometimes?) find myself
> writing code like:
> 
>         let index = x - y + i in
>            do something with (Array.get array index)
> 
> where x, y and i move around in a complex way, and the problem
> is that I sometimes forget to do +1 somewhere, and end up getting a
> negative value as index (but is found easily since it raises
> an exception!).
> 
> So I would very much like to have the function which doesn't allow
> negative indices to be kept (preferably leaving the name as [get], so
> as not to confuse users of standard arrays). If we must have a [get] with
> negative indices, it should be a different function.
> 
> What do you think?

Sure.  I don't feel strongly about this one.  Let's have a separate
function for indexing from the end.

Best,
Blair

-- 
Blair Zajac <bl...@or...>
Plots of your system's performance - http://www.orcaware.com/orca/

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: John M. S. <sk...@oz...> - 2003-04-14 06:55:16

 
> Sure.  I don't feel strongly about this one.  Let's have a separate
> function for indexing from the end.


No. If there is any doubt, leave it out. This operation is

entirely unnecessary in Ocaml which is perfectly capable
of doing index calculations and at the same time controlling
scopes .. unlike some other languages :-)

let last a = 

  let n = size a in 

  if n>0 then get a (n - 1) 

  else raise IndexError


let from_end a i = (* easy user space routine *)

Easy routines like this which are not heavily used
should NOT be put in libraries.

IMHO :-)

-- 
John Max Skaller, mailto:sk...@oz...
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: John M. S. <sk...@oz...> - 2003-04-14 06:44:45

Nicolas Cannasse wrote:

>>- I haven't made the reallocation strategy explicit.  And I'm not sure I
>>like the shrinking strategy.
>>

I have a published article on this subject.
The gist of it is:

1) the strategy is specified by a user supplied function
with signature

	strategy: int -> int -> int -> int

The first argument is the allocation size (total slots).

The second argument is the number of used slots.

The third arguument is the desired number of used slots.

The result is the number of slots which should be present.

2) It is nice to provide a default function which has
reasonable performance. A memory efficient algorithm is:

let chunk = 16

let strategy a u d =
   if d > a - chunk then roundup(d,chunk)+chunk
   else if d < a + chunk then roundup(d,chunk)+chunk
   else u

This basically adds or removes a fixed sized block
WITH HYSTERESIS (like a thermostat).

The semantics of use are: the physical allocator does NOT
have to allocate the recommended amount. After calling
the result must satisfy

	result >= 0
	result >= d

which is also a constraint on the physical allocator.

An faster routine uses a doubling factor instead,
but HYSTERESIS is vital to prevent thrashing.

-- 
John Max Skaller, mailto:sk...@oz...
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: Brian H. <bri...@ql...> - 2003-04-14 15:58:02

On Mon, 14 Apr 2003, John Max Skaller wrote:

> 1) the strategy is specified by a user supplied function
> with signature
> 
> 	strategy: int -> int -> int -> int
> 

I like it.  Allows the user much better customization of allocation 
routines.

> 
> 2) It is nice to provide a default function which has
> reasonable performance. A memory efficient algorithm is:
> 

[ snip- alogrithm that increases size in chunks ]

I'm not sure Hysteresis is the best algorithm.  Here's the problem.  
Assume that we double the array size every time we need to increase it's
size, and start at an array size of 1.  We then add n elements to an empty
array- how much copying do we need to do?  Well, when we add the second
element, we need to copy one element to a new array (of length 2), when we
add the third we need to copy two elements to a new array (of length 4),
we add the fifth element, we need to copy 4 elements to a new array (of
length 8), etc.  So we end up copying 2^0 + 2^1 + 2^2 + 2^3 + 2^4 ...  
elements.  If n is just 1 element shy of being, or is a power of 2, we
have to copy n-1 elements total.  If n is just over a power of two, we 
have to copy 2n - 1 elements total.  In either case, to add n elements to 
an empty array requires O(n) elements being copied.

Now, let's consider the same situation, adding n elements, but with a 
hysteresis algorithm.   Every time we need to increase the array size, we 
increase the array size by a fixed amount- say c elements.  To put numbers 
on it, lets assume c=10 and n=1000.  The first 10 elements get added for 
free (no copying), but at element 11 we have to copy the first 10 
elements.  Then at element 21 we end up having to copy the first 20 
elements, at element 31 the first 30 elements, and so on.  The total 
number of elements we have to copy is 10+20+30+..., or to make the series 
more obvious, 10*(1+2+3+...).  So using the simple forumla sum 1..n = 
n*(n+1)/2, we get:
      (n/c)*((n/c)+1)        n * (n + c)    1     n^2 + (n * c)
   c* --------------- =  c * ----------- * --- = ---------------
            2                     c         2          2

In other words, we have to copy O(n^2) elements to add n elements.  In our 
example of n=1000 and c=10, we have to copy 49,500 elements.  While in 
the double algorithm, since 1000 < 1024, we only have to copy 1023 
elements, or 2.07% the work that Hysteresis requires.  Going to 1025 
elements requires doubling to reallocate and copy 1024 elements, bringing 
the total to 2047 elements copied.  But hystersis requires an extra 3,030 
copies- almost three times the work doubling requires.

> An faster routine uses a doubling factor instead,
> but HYSTERESIS is vital to prevent thrashing.

You preventing thrashing by letting shrinkage go longer before shrinking.  
Here's the idea: you don't shrink the array until it falls below 25% 
capacity, and then you halve the size until the array is back up above 25% 
capacity (but it'll still be below 50% capacity).  Thrashing is still a 
possibility, but only if the size is bouncing over a very wide range- at 
which point I question if it's really thrashing.

Although this is why I like the idea of having a user definable 
reallocation strategy.  Don't like the default?  Write your own.  Even 
better, I could see users writting custom reallocation strategies that 
"hook into" how the arrays are being used, for example:

let my_realloc_strategy currsize used =
    if (my_algorithm_phase == heavy_allocation_and_deallocation) then
        (* Aggressizely upsize the array to avoid copying, and rarely
         * if ever downsize the array, I'll need those slots.  Probably
         * use a doubling algorithm.
         *)
    else
        (* Conservatively upsize the array, but aggressively downsize.
         * Probably use a hystersis algorithm.
         *)
;;

Brian

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: John M. S. <sk...@oz...> - 2003-04-24 19:00:05

Brian Hurt wrote:

> On Mon, 14 Apr 2003, John Max Skaller wrote:

> I'm not sure Hysteresis is the best algorithm.

[snip]

> You preventing thrashing by letting shrinkage go longer before shrinking.  

Must be some misunderstanding here.

Hysteresis doesn't imply adding a fixed chunk on growing, nor
removing one on shrinking. It simply implies the threshholds for
growth or shrinkage are not the same, but separated somehow so that
a requirement for 1 more slot which causes an allocation, when followed
by a requirement for 1 less slot, does NOT cause a reduction.

The idea applies no matter whether growth is a constant chunk,
or a fractional increase, or some other calculation:
arrays grow when they must, but don't
shrink until they waste a certain amount of space (which is
more than the amount wasted after an allocation).

This is what you described for the doubling routine:
if you allow 25% wastage before shrinking, and you remove
only 15% of the unused space when you do shrink, then small
vibrations don't cause either allocations or deallocations.

Anyhow, the point is not to use a naive routine where the
growth and shrinkage triggers are the same, since a loop
which adds an element and then removes one repeatedly would
cause violent and unacceptable thrashing.

And the real point is: there is no single optimal
strategy, it depends on the use to which the array
is being put. Hence, allowing the user to specify
the strategy allows for the user to tune performance
in a way reasonably well separated from the
overall program logic.

In my system I had:

(1) a standard strategy (chunks of 16 elements)
(2) a global variable which was initialised to the standard
strategy but which could be changed
(3) a default constructor using the global strategy
(4) a constructor accepting a strategy argument
(5) a method for changing the strategy for any individual array

In addition, my stratgies were objects which had some
data, which could also be changed (eg: chunk strategy
was parameterised by chunk size).

The cost is a function pointer in each array,
and some overhead calling it when necessary.

-- 
John Max Skaller, mailto:sk...@oz...
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: Brian H. <bri...@ql...> - 2003-04-24 20:16:26

On Fri, 25 Apr 2003, John Max Skaller wrote:

> Brian Hurt wrote:
> 
> > On Mon, 14 Apr 2003, John Max Skaller wrote:
> 
> > I'm not sure Hysteresis is the best algorithm.
> 
> 
> [snip]
> 
> > You preventing thrashing by letting shrinkage go longer before shrinking.  
> 
> 
> Must be some misunderstanding here.

Yes there was.  I was misunderstanding you.  My apologies.

The good news was that in the current version of Xarray in the libraries, 
every time I allocate a new array I allow the caller to pass in what 
reallocation strategy to use.  Neither of the two I have defined are what 
you are talking about- but new strategies can easily be added.

I tried to document in xarray.mli how to write a resizer function.  If the 
comments are clear, please let me know and I'll improve them.

> And the real point is: there is no single optimal
> strategy, it depends on the use to which the array
> is being put. Hence, allowing the user to specify
> the strategy allows for the user to tune performance
> in a way reasonably well separated from the
> overall program logic.

I fully agree.  I can even envision application-specific resizer 
functions, which "know" the behavior of the algorithm being used and can 
behave differently as the algorithm changes it's allocation behavior.  Or 
implements heuristics to try and predict the algorithm.  Etc.

The doubling algorithm is the overall generally best algorithm that I can
think of.  Basically, adding and subtracting elements is O(1)- making the 
construction and use of large xarrays not implausible.

> In my system I had:
> 
> (1) a standard strategy (chunks of 16 elements)

I'm using doubling.  See above.

> (2) a global variable which was initialised to the standard
> strategy but which could be changed

Hmm.  Could be done with a reference.  But this is a very non-ML sort of 
concept.

> (3) a default constructor using the global strategy

Got it.

> (4) a constructor accepting a strategy argument

Got it.

> (5) a method for changing the strategy for any individual array

I only allow new strategies on allocating new arrays.  This include 
'implicit' allocations like of_enum and copy.  Although I suppose this 
could be added.

> 
> In addition, my stratgies were objects which had some
> data, which could also be changed (eg: chunk strategy
> was parameterised by chunk size).

Partial evaluation takes care of this.  

> 
> The cost is a function pointer in each array,
> and some overhead calling it when necessary.
> 

Already paying it.  I call the resizer function (which tells me what size 
the array should be) every time I add or remove an element from the array.

Brian

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: John M. S. <sk...@oz...> - 2003-04-25 06:52:50

Brian Hurt wrote:

> The doubling algorithm is the overall generally best algorithm that I can
> think of.  Basically, adding and subtracting elements is O(1)- making the 
> construction and use of large xarrays not implausible.

By best you mean 'fastest'. This is clearly not the only requirement.
For example it would be a disaster on a prime number generator
in which minimisation of storage use is more important. Here the time
to copy is irrelevant compared to the time computing primes, where the
need to fit large numbers in storage is dominant.

Obviously, doubling is less efficient speed wise than tripling.
My experiments showed that, as you claim, a factor of 2 is close
to optimal: multiplying by 10 is faster, but not much. OTOH a factor
of 1.8 begins to slow the system down... on the tests I ran anyhow.

> Hmm.  Could be done with a reference.  But this is a very non-ML sort of 
> concept.

I agree: loss of reentrancy .. BUT provided the store is atomic
it only affects performance not semantics if the variable is
changed at random even by multiple threads. Setting a strategy
for a whole program depending on what phase of execution it is up
to is fairly useful.

>>(5) a method for changing the strategy for any individual array
>>
> 
> I only allow new strategies on allocating new arrays.  This include 
> 'implicit' allocations like of_enum and copy.  Although I suppose this 
> could be added.

It can also be left out until someone needs it (if unsure

of utility LEAVE IT OUT :-)

-------------------------------------------------------- 
one thing I found: I needed 3 arguments not 2:

current slots, current used slots, required size

Conceptually, the reason is: the ideal size
may be influenced by the allocated size for example:

Double when necessary, only shrink if the desired size
is zero.

The extra argument is necessary for asymmetric growth/shrinkage
algorithms, so I recommend you add it: at present you cannot
take 'unused space' into account since it cannot be computed.

-- 
John Max Skaller, mailto:sk...@oz...
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: Brian H. <bri...@ql...> - 2003-04-25 19:20:35

On Fri, 25 Apr 2003, John Max Skaller wrote:

> By best you mean 'fastest'. 

Fastest in the most common cases, to be precise.  With "reasonable" (for
suitably lose definitions of reasonable) space wastage.  With doubling you
never waste more than about 75% of the space, and rarely waste 50%.  This
is actually not bad, compared to other data structures, if you consider
the wasted space "infrastructure overhead".  Consider a binary tree, where
the data is stored only at the leaves. You will have about as many branch
nodes as you have leaf nodes.  Suddenly have the memory used in branch 
nodes are references to other branch nodes, not references to data- 
"wasted" space.

Increasing the muliple decreases the number of copies you need to make, at 
the cost of greatly increasing the "wasted space" on average.  Plus, 
powers of two are generally easier to allocate than powers of anything 
else.

> > Hmm.  Could be done with a reference.  But this is a very non-ML sort of 
> > concept.
> 
> I agree: loss of reentrancy .. BUT provided the store is atomic
> it only affects performance not semantics if the variable is
> changed at random even by multiple threads. Setting a strategy
> for a whole program depending on what phase of execution it is up
> to is fairly useful.

I was just thinking not-functional.  Not in that it wouldn't work, but in 
that it's not how most functional programming flows, if that makes sense.

> -------------------------------------------------------- 
> one thing I found: I needed 3 arguments not 2:
> 
> current slots, current used slots, required size

The inputs xarray has to give the resizer algorithm are:

- the current number of slots the array has

- the current number of slots the array needs

Note that if I'm increasing or decreasing the array size, I adjust the 
number of slots needed before passing it into the resizer.  What the 
resizer returns should be the number of slots the array should have.

Once the resizer calculates what size the array should be, the xarray code 
actually deals with allocating a new array and copying data over (if 
needed), etc.

Maybe these parameters should have a better name.

> Conceptually, the reason is: the ideal size
> may be influenced by the allocated size for example:
> 

Always.  You need to return the current size if you don't want to 
reallocate the array.

> 
> The extra argument is necessary for asymmetric growth/shrinkage
> algorithms, so I recommend you add it: at present you cannot
> take 'unused space' into account since it cannot be computed.
> 

Hmm.  Consider the following case.  You have an array, using the doubling 
algorithm, with a size of 16 and a length (number of used elements) of 10.  
Now, you add an enumeration with 100 elements in it.

Currently, the resizer function is called with a currsize of 16 (the 
current number of spaces in the array when it's called) and a length of 
110 (how many elements need to fit into the array).  Exponential_resizer 
then keeps doubling 16 until it gets a long enough array size- 128 in this 
case.  It returns 128.

If I understand you correctly, in this example you also want 10 (call it
'oldlength')  passed in as well.  Certainly not difficult.  I just don't 
see what you get out of it.  I suppose you could keep the same percentage 
of free space, and return 176 instead of just 128.

If resizer returns anything except the current array size, a new array is
allocated and the data copied over (hmm.  Must remember to null out the
old array after copying everything over).  Should having a small amount of 
data make the algorithm more inclined to copy data over?

Brian

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: John M. S. <sk...@oz...> - 2003-04-26 15:30:48

Brian Hurt wrote:

> On Fri, 25 Apr 2003, John Max Skaller wrote:
> 
>>-------------------------------------------------------- 
>>one thing I found: I needed 3 arguments not 2:
>>
>>current slots, current used slots, required size
>>
> 
> The inputs xarray has to give the resizer algorithm are:
> 
> - the current number of slots the array has
> 
> - the current number of slots the array needs

>>The extra argument is necessary for asymmetric growth/shrinkage
>>algorithms, so I recommend you add it: at present you cannot
>>take 'unused space' into account since it cannot be computed.
>>
>>
> 
> Hmm.  Consider the following case.  You have an array, using the doubling 
> algorithm, with a size of 16 and a length (number of used elements) of 10.  
> Now, you add an enumeration with 100 elements in it.
> 
> Currently, the resizer function is called with a currsize of 16 (the 
> current number of spaces in the array when it's called) and a length of 
> 110 (how many elements need to fit into the array).  Exponential_resizer 
> then keeps doubling 16 until it gets a long enough array size- 128 in this 
> case.  It returns 128.
> 
> If I understand you correctly, in this example you also want 10 (call it
> 'oldlength')  passed in as well.  

Yes.

> Certainly not difficult.  I just don't 
> see what you get out of it.  

Suppose you have a rule: when you're increasing the number
of elements in the array, you can resize it, but never
when you're decreasing them. That might be useful,
but it cannot be implemented unless you know the number
of elements in the array -- i don't mean slots of store
but actual occupied slots.

For example: the number of slots is set to the number
of elements on expansion. So for example adding 1 element
at a time to 5 then subtracting, then adding again:

data slots
0     0
1     1
2     2
3     3
4     4
5     5
4     5
3     5
2     5
1     5
0     5
1     1
2     2
3     3
4     4
5     5

The number of slots can shrink, but only
when the number of used slots increases.

Why would you want to do this? I have no idea.
But it is a viable algorithm and it cannot
be implemented without passing the number
of used slots in .. a general rule says
"pass the whole of the state in" and the
state is charactized by both the physical
slot count and the number of used slots.

NOte that one could argue that passing
in the size of a slot (in bytes) also makes
sense, since a strategy may be based on tuning
around memory amounts (Kbytes) due to the finiteness
of this quantity, rather than the more abstract notion
of a slot.

-- 
John Max Skaller, mailto:sk...@oz...
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: Brian H. <bri...@ql...> - 2003-04-26 22:05:04

On Sun, 27 Apr 2003, John Max Skaller wrote:

> Brian Hurt wrote:
> 
> > Certainly not difficult.  I just don't 
> > see what you get out of it.  
> 
> 
> Suppose you have a rule: when you're increasing the number
> of elements in the array, you can resize it, but never
> when you're decreasing them. That might be useful,
> but it cannot be implemented unless you know the number
> of elements in the array -- i don't mean slots of store
> but actual occupied slots.
> 
> For example: the number of slots is set to the number
> of elements on expansion. So for example adding 1 element
> at a time to 5 then subtracting, then adding again:
> 
> data slots
> 0     0
> 1     1
> 2     2
> 3     3
> 4     4
> 5     5
> 4     5
> 3     5
> 2     5
> 1     5
> 0     5
> 1     1
> 2     2
> 3     3
> 4     4
> 5     5

Ah!  Only reallocate when you start adding elements back in, but then 
shrink to a correct size.  You'd have to save state that you had shrunk, 
but that's not hard.

OK.  Yes, for that you need to know if you're adding or removing slots.

I was thinking you meant something like:
data	slots
0	0
1	1
2	2
3	3
2	3
1	3
0	0 <- shrink happened
1	1
2	2
...

etc.  You have the shrink happening on the first insert.

And that's actually not *that* strange of a resizing algorithm, especially 
if you know that deletes are "bursty".  It saves copying, and may even 
save short-term memory usage (if the GC wouldn't get around to collecting 
the old array soon enough).

> NOte that one could argue that passing
> in the size of a slot (in bytes) also makes
> sense, since a strategy may be based on tuning
> around memory amounts (Kbytes) due to the finiteness
> of this quantity, rather than the more abstract notion
> of a slot.

Especially in that I (the xarray module) don't know how large an 'a is.  
Most likely it's just a reference off somewhere to some other type.  Which 
is one of the reasons adding a user-defined resizer function is usefull.  
If you do know (or at least can guess) how big of data structures you're 
kicking around, write a specific resizer for it.  Or have a 
parameterizable resizing function, like I did with the step function.

Does anyone mind if I use named parameters for the resizer?  Use the type 
(for example):
resizer: numslots:int -> oldlength:int -> newlength:int -> int
?

Make it more obvious what the various arguments are.

Brian

Re: [Ocaml-lib-devel] Xarray : resizeable arrays (first cut)

From: John M. S. <sk...@oz...> - 2003-04-27 01:58:40

Brian Hurt wrote:

> Especially in that I (the xarray module) don't know how large an 'a is.  
> Most likely it's just a reference off somewhere to some other type. 

Oh yeah .. my C/C++ experience lets me down here (many unboxed
data types of different sizes).

> Does anyone mind if I use named parameters for the resizer?  Use the type 
> (for example):
> resizer: numslots:int -> oldlength:int -> newlength:int -> int
> ?
> 
> Make it more obvious what the various arguments are.

I agree, this is a case where named parameters are 

a good idea: usage of strategy function is going
to be light (not much use), so the extra clutter
of naming is small but the benefit of seeing those
unfamiliar uses is correct is great.

BTW: with respect to boxing: we cannot support the
unboxed float array (etc) types the specialised Ocaml
arrays can. Or can we?

[Funny but once one gets down to optimisation,
the familiar problems of C/C++ turn up again:
smly, once one wants advanced name control,
again the same problems, and solutions, begin
to suggest themselves: C++ is messy because
users want things that are convenient and efficient,
which is not always the same thing as pure and clean..]

-- 
John Max Skaller, mailto:sk...@oz...
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850