pascal-p6 / Discussion / General Discussion: Container arrays (or "what the heck am I working on")

Container arrays (or "what the heck am I working on")

Container arrays for Pascaline originate back in 1993, when I was working on a
new compiler for the 80386 series. I did that because I had already recoded
my original Z80 assembly coded compiler front end to be Pascal only in the
1980's, and because the off the shelf 32 bit compilers I was using by that time
were dying as a business model (another story).

Container arrays are perhaps the oldest major feature of Pascaline and are
still one of the most important. As Wirth often said, you can't look at a
feature in isolation, but rather how it meshes with the rest of Pascal, and
how it dovetails with other Pascaline features.

Pretty much everyone using ISO 7185 Pascal feels the need for variable string
length allocation, and perhaps other types of arrays as well. There are
solutions to that using standard ISO 7185 Pascal (yes, there are). One of these
was used in Don Knuth's TeX system, which used a pool of strings placed into
a single large character array. Another one was the "quanta" system I used
in the Pascal-P5 compiler.

Before the 80386 compiler, IP Pascal handled the need for variable length string
allocation by simply assuming that all strings were some fairly long length
(like 250 characters), then using a call:

type string = packed array [1..250] of char;
     pstring = ^string;

var p: pstring;

procedure getspace(var p: pstring; l: integer);

...

getspace(p, 42);

getspace() was, of course, implemented in assembly language and was an
an alternative to new(). It would be the equivalent to new(p), except that it
only allocated 42 characters, and not the full 250. It meant that you only
allocated what you needed, that you needed a separate routine for each type,
and (most importantly) that there was no protection against overrunning the
end of the string. Just like in C, if you index off the end of the array, you
are corrupting the system [1].

In the run up to 80386 IP Pascal, I went over several plans to implement
variable length arrays. In this, I have to owe a debt of gratitude to the
ISO 10206 standard, a standard I have been very critical in the past, mainly
because it got me thinking in the correct direction. In fact, many of the
things I really didn't like about ISO 10206 I ended up adopting, and container
arrays is one of them.

After a few years of producing various plans for variable length array
allocation, I settled on the following founding ideas:

Variable array allocation should not increase the cost of standard Pascal
arrays. At all. Whereas it is reasonable to expect that use of variable length
arrays would increase the cost when using them, a reasonable Pascal user could
expect that their program not suffer speed or space inefficiencies while using
legacy Pascal code. In specific, this stopped the idea of having a single
normalized form for arrays that indicated their length, regardless of if they
were ISO 7185 arrays or extended Pascaline arrays.
They should interoperate completely with ISO 7185 Pascal. Specifically,
variable arrays should be assignment and expression compatible with ISO 7185
arrays, in both directions (to and from variable arrays). An ISO 7185 array
should be able to be passed as a variable length array parameter.
Variable arrays should not implement the ability to vary the length of the
array at runtime. Rather, they only allocate the space in the array according
to the programmer request, and then enforce protected access to that space.
This does not mean that tracking variable length arrays is not possible, only
that it is left to the user. The compiler only provides the ability to mark off
a length of space in an array, then enforce legal access to that space. What
you do with that space is up to you.
The system should work for any type of array.
The condition #3, is very important. I have seen other systems that
attempt to give the programmer the "illusion" of variable length strings (Basic
comes to mind), but really implement it by cutting off a fixed length of space
and controlling the "size" of it via procedures and functions. This dramatically
increases the complexity of the concept. Pascaline has such a series of string
functions and procedures, but it is in a completely portable, user specified
library.

I called this "containers", and later "container arrays", because I found that
the term "container" was already in common use for other things. The term
"container" goes to point #3 above. Pascaline does not implement true variable
length arrays, but rather allows you to create fixed arrays of arbitrary length
that "contain" the actual array you want.

Container arrays are built on one single idea, in fact a powerful one, that
carries over to class/object orientation as well: a complex pointer. Each
complex pointer consists of:

0: Base data pointer.
1: Template/descriptor.

The base pointer is simply the address of the data in the array. The template
describes the length of the array. In 1993, this was simply the integer length
of the array. In Pascaline, this got expanded to a template pointer, which I
will explain later.

The advantage of keeping the target data as a binary pair is that at any time
it can interoperate with standard ISO 7185 arrays. Going from ISO 7185 fixed
array to complex pointer means adding a template to the array pointer. Going
from container array to standard array means checking the template against the
fixed length and dropping the template.

So in the 1993 compiler, you specify:

type string = packed array of char;
              pstring = ^string;

var p: pstring;

...

new(p, 25);

p^ := 'hi there bob            ';

Means to create an array of 25 elements and assign it to the pointer, which is
a complex pointer. Unlike the getspace() scheme described before, the compiler
enforces any accesses to the array, so p^[26] gives you an error. you will note
also that the assignment above actually mixes fixed and container array types,
since a fixed string is being assigned to a container array.

The original 1993 implementation only used vectors (single dimension arrays) and
the "template" was always just a length. In fact, the first couple of times I
used this method, it struck me as pretty ugly. In the assignment:

new(p, 25);

p^ := 'hi there bob            ';

you have to create a string of the exact correct number of characters (25),
then assign to it. It has to match the string it is being assigned from exactly.
This applies even if you are using two container arrays:

var p1, p2: pstring;

...

new(p1, 25);
new(p2, 25);

p1^ := p2^;

But note that this only applies if you are matching two different arrays. In the
correct context, it actually makes a lot of sense:

procedure lcases(var s: string);

var p: pstring;
    i: integer;

begin

   new(p, max(s));
   for i := 1 to max(s) do p^[i] := lcase(s[i]);
   s := p^

end;

Where max() is a built in function giving you the size of the array argument.

In fact, creating, using, and then discarding an array is a frequent paradigm
in Pascaline. The string library of Pascaline is built on that (as indeed is
the language C#).

One of the more interesting characteristics of the 1993 compiler is it allowed
easily passing constant strings to routines. In ISO 7185 Pascal, you can do:

type string = packed array [1..12] of char;

procedure prtstr(p: string);

...

prtstr('hello, world');

But there is a lot wrong with that. First, the string length is fixed, so only
one size string can be passed. Second, this is using value parameter rules, so
the constant string is in fact copied entirely to a buffer before being passed
to the routine.

Now contrast that with Pascaline:

type string = packed array of char;

procedure prtstr(view p: string);

...

prtstr('hello, world');

Now the string can be any length, and nothing is copied, because the rules of
"view" parameters are that the parameter can be any expression result, and
cannot be modified during the routine. This means the compiler is free to pass
the actual string, not a copy.

ON TO PASCALINE

The 1993 version of container arrays had several limitations. First, it could
not be used with value parameters. VAR and VIEW parameters, yes, but not value.
Also, it could not allocate multidimensional arrays (matrices or higher).
Finally, it could only be used with pointer or dynamically allocated variables.
None of this was a binding limitation. For value parameters, you make copies of
arrays. For matrix or higher array levels, you roll your own:

type vector = array of integer;
     pvector = ^vector;
     matrix = array of pvector;
     pmatrix = ^matrix;

var mp: pmatrix;

...

new(mp, 100); ! allocate matrix top index
for i := 1 to 100 do new(mp[i], 100); { allocate matrix second level }

mp^[42]^[73] := 1;

Ungainly, but serviceable. In fact, this is how you make so called "sparse
arrays" which are popular in C, where sparse means that each vector in a matrix
can have its own size.

For local variables of container arrays, you simply have to create a dynamic
pointer at the start of a block, and terminate it at the end.

In pascaline, container arrays are orthogonal over all these operations. Any
index level of array is acceptable. They are usable with any parameter mode,
value, VAR, VIEW or OUT. And declaring them as variables is not only possible, it
is encouraged as part of the "Pascaline style":

program test;

type maxtrix = array of array of integer;

var m(100, 100): matrix;

procedure setmatrix(mt: matrix);

begin

  m[42, 73] := 1;

end;

begin

   setmatrix(m)

end.

In this trival example, matrix m is created as a program global, passed as a
value parameter to setmatrix, and operated on. Because it is a value
parameter, a copy of the entire matrix must be created, or 10,000 integers,
quite a lot of data. Its not really modified by setmatrix, since it is just a
copy. And we don't worry about how or where the compiler creates the matrix m,
or how it gets rid of it.

Note that the "array of array of..." notation becomes a side effect of not
specifying the indices of the array. Another way to do that (which Pascaline
DOES NOT implement) would be array [*,*] of integer[3].

Now going back to the idea of a complex pointer internal implementation:

0: Base data pointer.
1: Template/descriptor.

What is different? Well, the template is no longer just a length, but a pointer
to a true template, as in a list of indices:

template address -> 100,100

In other words, its a table that describes the layout of the matrix in terms of
its index sizes.

It turns out, this is backwards compatible with both simple length based
templates, because templates with only a single dimension are simplified to only
hold a single length. Thus:

Array type          Contents of template field
================================================
ISO 7185            None
Vector              Length
Matrix or higher    Pointer to template table

In operations, pointers to arrays are always stated in their simplest form. When
the above matrix is "sliced", or has only a single vector picked out:

m[42]

Then it is automatically simplified to address/length only form. If an array is
reduced to an element:

m[42,73]

The template part of the pointer is dropped entirely.

In the case of matrices or higher order indexing, ISO 7185 Pascal arrays
interoperate with container arrays by using what I call "pick up templates". For
each fixed array in ISO 7815 Pascal, we keep a template filled out at compile
time handy in the constants area. If we need to, say, assign a fixed matrix to
a container array based matrix, we can convert the fixed version to a complex
pointer by simply adding a pointer to the fixed table in memory.

WHERE DID YOU PUT THAT?

It turns out that WHERE you put container arrays is not only interesting, but
it adds a new capability to the language, one that is hidden from the
programmer. A simple global container array:

program x;

var a(10): array of integer;

begin

   ...

end.

In Pascal-P6 this is actually a synonym for:

program x;

type ai = array of integer;

var ap: ^ai;

begin

   new(ap, 10);
   ...
   dispose(ap)

end.

Note that in the case of containers, you don't specify the length at dispose.
The allocation of program variables is done statically, that is, they are
compiled in to the program and the total space allocated to globals does not
change. The compiler cannot allocate variable spaces in that that are not known
until runtime. So what it does instead is allocate them dynamically, and it
does that automatically for you, both when setting them up and tearing them
down.

What happens if the variable is a local?

program x;

procedure y;

var m(100, 100): array of array of integer;

begin

   ...

end;

begin

   ...

end.

Here, procedure y could allocate matrix m as a dynamic variable, but there is
a better way: it can allocate them on the stack. This has several advantages.
It reduces heap fragmentation, is usually faster than dynamic allocation, and
has no overhead on teardown at all. When the stack frame for a routine is thrown
away, the allocation for such variables gets thrown away with it.

In fact, this is the basis of the C function alloca() which is available on
some systems. The compiler actually gives you this for free[2]

INITIALIZERS

You probably have noticed that the examples so far are just a different form
of ISO 7185 fixed arrays. The declaration:

var m(100, 100): array of array of integer;

Is equivalent to the ISO 7185 declaration:

var m: array [1..100,1..100] of integer;

In Pascaline, the first form is preferred. Why? In fact, A.N.Habermann gave the
reason back in 1973 when he identified the problem with fixed arrays in Pascal:

"The size of an array can hardly be considered as part of it's type"

That is to say, the basic form of an array as a structure does not, and should
not, change with its length, any more than a file with 11 elements is completely
different than a file with 10 elements, or a set of 11 members is different from
a set of 10 members, etc.

In fact, fixed array limits are the only part of Pascal type specifications that
have the property of making the type incompatible. A subrange of 1..10 is
compatible with a subrange of 1..11. The only thing that comes close is a
variant record:

type r = record case boolean of false: (i: integer); true: (c: char) end;

But in that case it is a list of different forms grouped under the same type[4].
Thus, Pascaline considers arrays to be equal in type if their element type is
the same and in the same format, which (curiously) makes their packed and
unpacked status incompatible. More on that later. Pascaline considers the sizes
of arrays to be a property of the variable, not its type. Thus:

var a(100), b(200): array of integer;

Are different size arrays, but both compatible with each other (and any other
array of integer).

What about:

var a(100): packed array of integer;
    b(200): array of integer;

These are not compatible arrays. Why? The reason is that the packing method used
to store the values could actually change the format of the data in the array.
If we have:

var a(100): packed array of boolean;
    b(200): array of boolean;

This (hopefully) becomes more obvious. An array of boolean probably takes one
byte for each of the boolean values. Thus b is 200 bytes long. However, a could
well be 12.5 bytes long, or 13 bytes (rounded up), because 100/8 = 12.5. The
compiler can, if it is able, store booleans in only one bit, and stuff as many
of those bits into the available space as it can. This is not just a change in
array size, but a fundamental change in format.

In fact, Pascaline considers any container arrays with the same packed status
and base type to be equivalent. This is is perhaps the most questionable feature
of Pascaline container arrays, but it has a purpose. Any library function with
an equivalent array parameter is automatically compatible.

Ok, so if these forms are equivalent:

var m(100, 100): array of array of integer;

var m: array [1..100,1..100] of integer;

What is the point? Well, it is because the initializers for the variable are not
limited to constants. They can be full expressions. They can contain constants,
parameters and even functions:

type vector = array of integer;

procedure addone(view v: vector);

var vc(max(v)): vector;
    i: integer;

begin

  for i := 1 to max(v) do vc[i] := v[i]+1;
  ...

end;

This routine creates a local array vc with the same size as the parameter v,
then copies the contents of v to vc and operates on it. A more complex
example would be:

type vector = array of integer;

procedure addone(view v: vector);

function vsize: integer;

begin

   vsize := max(v)

end;

var vc(vsize): vector;
    i: integer;

begin

  for i := 1 to max(v) do vc[i] := v[i]+1;
  ...

end;

Here we actually use a function to determine the size.

The one thing you cannot, or more correctly should not, do is use local
variables in initializers, since by definition they are not themselves
initialized:

type vector = array of integer;

procedure addone(view v: vector);

var x: integer;
var vc(x): vector;

begin

end;

This results in an error because x is not initialized. You could get around
this (but don't!):

type vector = array of integer;

procedure addone(view v: vector);

var x: integer;

function vsize: integer;

begin

   x := 10;
   doit := 0

end;

var vc(vsize+x): vector;
    i: integer;

begin

  for i := 1 to max(v) do vc[i] := v[i]+1;
  ...

end;

This is dirty pool in a lot of ways. First, the function vsize has a side
effect, which is bad form. Second, the evaluation of vsize and x is order
dependent, and Pascal does not guarantee order of execution of expression
operators (nor should it!). In fact, I am willing to bet there is no clean way
to introduce locals into initializers (globals are a different story, as well
as locals belonging to an outer block).

SUMMING UP

Container arrays add a lot of new capability to ISO 7185 Pascal, and finally
solve the issue of creating and working with variable length array data in a
way seamless with original Pascal. They also dovetail with Pascaline being
an object oriented language, but that is a subject for another time.

[1] In fact, the Z80 IP Pascal compiler had several features in common with C,
including its modular structure.

[2] To be fair, this could actually work in the program block as well, and
Pascal-P6 actually does allocate some variables on the stack.

[3] I have occasionally thought about adding this notation, but there is not
a pressing need to declare arrays with huge numbers of indices.

[4] Which makes variant records redundant with classes. However, although
Pascaline proposes better methods to do some tasks than ISO 7185 Pascal, it
never replaces them (or "depreciates" them).

Last edit: Scott Franco 2018-10-17

Container arrays (or "what the heck am I working on")

Next version of ISO 7185 compatible Pascal from Pascal-P5

Forums

Help

Container arrays (or "what the heck am I working on")

Container arrays (or "what the heck am I working on")

Next version of ISO 7185 compatible Pascal from Pascal-P5

Forums

Help

Container arrays (or "what the heck am I working on") document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Container arrays (or "what the heck am I working on")