FOOS: Under the hood

Labels: Object Orientation (2) Preprocessor (2) Internals (17)

Authors:

Introduction

Feature requests are not uncommon – and I take them very serious. However, it usually takes some time before they are implemented. Floating point was one of them. Object Orientation was another.

The point is – I never take the easiest and most obvious way, which is to dump everything into the core. I consider the core to be a virtual processor, which should be kept as lean as possible. That’s why 4tH’s floating point support is just a bunch of (optional) libraries. You don’t have to drag that around if you don’t need it.

Object Orientation is even harder to implement, because a bunch of libraries is not going to cut it. Fortunately, 4tH has a very capable preprocessor, which is able to rewrite your OO-program in such a way that the compiler can handle it.

So, how does it work? Well, an object is just a structure on steroids. It can hold data, but also execution tokens to “methods”. We call such fields “selectors”.

But there is a slight problem – 4tH has a Harvard architecture, which means cells and characters live in different worlds. Which one of the two are we going to pick? Since most fields require a cell (like integers, pointers to other objects, floating point numbers and execution tokens) it seemed that was the way to go. Anyway, strings could easily be represented by a pointer to a string stored somewhere else.

Encapsulation

That allowed for encapsulation, that is a construct which facilitates the bundling of data with the methods operating on that data. Sure, it was a bit hard to add “public” and “private” keywords to each member, but that was taken care of by the preprocessor. You simply declare the private members at the end of the structure by the PRIVATE{ keyword, e.g.

private{ field1 field2 field3 }

Which expands to:

hide field1
hide field2
hide field3

That takes ‘em out of the symboltable and that’s that. We got encapsulation. Next.

Inheritance

Inheritance means that exisiting structures can be expanded – which is not as hard as you might think. Note in 4tH fields are just an offset. They contain a value that is added to the base address of a structure in order to form the address of an individual field. The STRUCT keyword is just a CONSTANT with the value “0”. That value is added to the first field, so when executed it adds “0” to the structures base address.

The next field, however, is assigned the length of the first field defined – and so on. In the end, the END-STRUCT keyword is just a CONSTANT, which is initialized to the combined length of all the fields defined.

And now it gets interesting. If we do not use the keyword STRUCT, but the END-STRUCT constant defined by another structure, that space is effectively reserved by that structure and used when calculating the offset of additional fields. The old offsets – originally defined by the first structure – still apply. No matter what we defined there.

Hence, the resulting structure is an extended version of our original structure. We got inheritance. Next.

Constructors and destructors

The downside is that the only thing we got is a structure. The difference between a structure and an object is that an object is always initialized at creation. So how are we going to do that.

Well, it means we have to call a word when we create an object. Such a word is called a “constructor”. So, when we create our structure, we also have to create a constructor. That’s what happens when you declare a class in FOOS.

In FOOS every class is a subclass of the predefined “Object” class. When you look at the definitions of the “Object” class, you’ll see there is very little object orientation concerned:

struct
  field: _create_
  field: _delete_
end-struct /Object

: ~Object [DEFINED] (~~new) [IF] free throw [THEN] ;
: Object ['] Object over -> _create_ ! ['] ~Object swap -> _delete_ !  ;

“Object” is the constructor and “~Object” is the destructor. “Object” initializes the fields _create_ and _delete_ with these execution tokens. It may not be much, but it is an initialization.

A typical class definition looks like this:

:class figure                          \ define an empty class figure
   extends Object                      \ with no properties and two
     virtual: surface                  \ uninitialized methods
     virtual: outline
   end-extends
;class

The :CLASS keyword, does nothing but parse the class name and put it on the string stack of the preprocessor. The EXTENDS keyword parses the superclass name, puts it on the string stack and starts to assemble the required structure:

/Object
  field: surface
  field: outline
end-struct /figure

Note there are tons of “syntactic sugar” involved. The keyword VIRTUAL: is nothing but an alias for FIELD:. The slashed names are generated by the preprocessor. This is called “name mangling”.

Another form of name mangling is very common with C++, BTW. It is used to differentiate between seemingly identical functions. In FOOS we have three different forms of name mangling, the first to define the size of a class, the second to define “default” methods and the third to define destructors. That’s why you have to be careful with names beginning with an underscore, a slash or a tilde.

Ok, now we got the structure definition out of the way, we can begin with our destructor – which simply is a copy of the previously defined destructor:

aka ~Object ~figure

Yes, you can override it – you’ll see how that works later on. However, if you don’t override it, it takes up no space at all, since an AKA defintion only resides in the symbol table. Now we can define our constructor:

: figure
  [‘] Object [‘] figure [‘] ~figure (~~init)
  drop
;

Note that every initialization needs to pass the address of the object created, so that one is on the top of the stack. It may also have passed other parameters – we can’t know at this stage. But what if these are consumed by the constructor of the superclass? That’s why we have to pass them to the constructor of the superclass first.

That is done by the (~~init) routine. It places the execution tokens of the constructor and destructor on the returnstack, along with a copy of the address of the object and executes the constructor of the superclass.

There it finds a similar construct, which calls its superclass first – and so on – until the Object constructor has been called. At that moment, the constructor and destructor fields of the object are filled with the Object version – until they are overwritten with those of the subclass – and so on, until we arrive at the subclass we’re currently defining.

In this case, we’ve done all initialization, so what’s left is to drop the address of the object and return. Note that if those pesky parameters are consumed by this class that they’re still on the stack, since they haven’t been consumed by any superclasses. So that works too.

The ;CLASS keyword isn’t just some fancy syntactic sugar – it has a purpose. The names of the class and the superclass are still on the string stack of the preprocessor. ;CLASS discards them.

But this was a simple class. We have to take this a bit further, e.g.:

:class Account
   extends Object
     ffield: accountBalance
     method: CheckBalance
   end-extends

    :new 500 s>f this -> accountBalance f! ;method
    :method CheckBalance this -> accountBalance f@ ;method

   private{ accountBalance }
;class

Here the property “accountBalance” is initialized. So how does that work. Well, END-EXTENDS and most method defining words like :METHOD and :VIRTUAL expand to a form which ends in a >R, in short: it puts the address of the object on the returnstack. The keyword THIS expands to R@, so the entire expression expands to:

500 S>F R@ -> accountBalance f!

Because what goes up, must go down, the words ;CLASS and ;METHOD expands to RDROP. And there we have our “open recursion”. Check.

Methods and selectors

Now it gets really interesting. How are methods initialized? There are four different kinds of method:

First, the standard method. These cannot be overridden, so it’s useless to define any fields for them. The METHOD: keyword simply removes the member from the structure. A :METHOD definition is just a plain old 4tH definition – no strings attached;
Second, the virtual method. These can be overridden, so we have to reserve a field for them – but it’s just a plain FIELD: definition;
Third, the virtual method with default. These are originally initialized with a “normal” 4tH definition, although they’re prefixed by an underscore. That means they’re still available to the system, even if we’ve overridden them later in a subclass;
The constructor simply strips :NEW and ;METHOD from the expression. They're only part of the constructor - which in fact is comprised of the entire class definition. It just looks a whole lot neater to have an "actual" constructor;
And finally, the destructor. A destructor is a virtual method as well. As we’ve seen those are always initialized with a default method based on the name of the class, prefixed by a tilde. That why the definition of a destructor doesn’t require a name.

Hence, this means that the “CheckBalance” definition is expanded to this:

: CheckBalance >r r@ -> accountBalance f@ r> drop ;

And that’s why this works:

myAccount -> CheckBalance

Note that the word -> doesn’t do anything. It’s just syntactic sugar to denote a field member. The “CheckBalance” definition expects the address of an object and that’s exactly what “myAccount” provides. There is no lexical binding between the two whatsoever. Pass any other object of sufficient size and it works – although it’s highly unlikely you’ll get what you expect, apart from error messages of course.

But now it gets really interesting, virtual methods – those things that behave differently in every class – or even every object. Declaring them isn’t too hard. Your VIRTUAL: declaration will simply be expanded to FIELD: – because that’s what it is. But what does this become?

:class button+                         ( addr --)
   extends button                      \ make a derived class
     virtual: xy!                      \ add more methods
     virtual: coords?                  \ e.g. coordinate handling
   end-extends                         \ initialize the new methods
                                       \ override method from supertype
   :virtual draw    reverse this -> default draw normal ;method
   :virtual xy!     this -> y ! this -> x !             ;method
   :virtual coords? this -> x ? this -> y ?             ;method

   private{ x y }                      \ from here on, x and y are private!
;class

Let’s focus on one of these virtual methods, let’s say “COORDS?”. That one expands to:

:noname >r r@ -> x ? r@ -> y ? r> drop ; r@ -> coords? !

Yeah, it’s simply a :NONAME definition, resulting in an execution token that is poked in the appropriate field. The only problem with that is that if you want to call it, like:

MyButton -> coords?

That you end up with the address of the virtual method – but it won’t execute. Now that is bad, but I can’t fix it, because there is no way either the preprocessor or the compiler can know this field contains an execution token that has to be executed. So you’re on. You can either call it this way:

MyButton -> virtual coords?

Or this way:

MyButton => coords?

And that will do the trick. Both forms expand to:

MyButton dup coords? @ execute

And both will behave exactly as you’d expect. If a derived class overrides that method, a different execution token will be stored in that field and hence it will behave differently – that’s no rocket science. It’s polymorphism. Check!

But the point is, that subclass will never be able to refer to the original method if its superclass. It’s not a VTABLE – that assignment is gone (please contact me if you have any ideas). But there is a way to regain at least some of that functionality: use a virtual method with a default.

A default method starts by declaring it as an ordinary virtual method. But there is a difference in defining it. Let’s say we want to turn our “COORDS?” definition to a default method:

:default coords? this -> x ? this -> y ? ;method

This will expand into:

: _coords? >r r@ -> x ? r@ -> y ? r> drop ; [‘] latest r@ -> coords? !

In short, it creates a “normal” defintion, whose execution token is used to initialize the appropriate member.

So, it you call it like this:

MyButton => coords?

It will execute normally – you can even override it – just like an ordinary virtual method. However, if you call it like this:

MyButton -> default coords?

Or – shorthand – like this:

MyButton <- coords?

It will expand to:

MyButton _coords?

And it becomes quite obvious that the orginal definition will be used. Note that you can define a default method only once in a class hierarchy.

Finally, the destructor. In some object oriented languages destructors are called automatically, when a variable goes out of scope. In 4tH, we don’t have scopes and hence we always have to call destructors explicitly.

In 4tH it only makes sense to define and use destructors if you’ve allocated your objects on the heap. If they’re static, they’ll do more harm than good – for the simple reason that there’s absolutely nothing to free, delete or destroy. If you allocated string space and you’re desperate to clean up before you leave, you’ll have to settle that yourself. Destructors help you doing that.

Since every class needs to have a destructor, the FOOS will define one for you without telling you about it.

For instance this piece of code:

:class three-light                     \ create a three light traffic light
   extends two-light                   \ based on the two light traffic light
     field: Yellow                     \ add the color Yellow
   end-extends

Will expand to:

/two-light
field: Yellow
end-struct /three-light
aka ~two-light ~three-light
: three-light ['] two-light ['] three-light ['] ~three-light (~~init) >r

You clearly see the AKA declaration after declaring the structure and before the definition of the constructor. That means if you don’t override any destructor it will still do what the Object destructor does, which is freeing your own allocated memory space.

However, if you do override the destructor with a :DELETE definition like this:

:delete this -> Yellow ds.free ;method

it will generate this code to get the job done:

hide ~three-light
: ~three-light dup >r r@ -> Yellow ds.free r> drop ~two-light ; ['] latest r@ → _delete_ !

It will simply throw the previous destructor out of the symboltable – which is OK, since a default destructor requires no code at all, just an entry in the symboltable. Then it will define a word - which is named after the class name and prefixed by a tilde – and patch it into the appropriate member of the structure. Note it will DUP the address of the object at the beginning of the definition, since it still has to call the destructor of the superclass at the very end of it.

Now you see why it is necessary to keep both class and superclass names on the stringstack of the preprocessor – it has to have access to them throughout the definition of the class.

When the destructor is called, first it frees the space taken up by the “Yellow” string, then it calls the one of its superclass “two-light”, which does its own housekeeping and finally it will end up at some superclass which merely executes the “Object” destructor – which doesn’t have to be “Object” itself, BTW – it can also be some AKA derivate of it.

Finally, the DELETE keyword simply calls the _delete_ method of the object. It’s no different from any other virtual method in that regard. It just looks more consistent. And this, my friends, is how destructors work in FOOS.

You may have wondered, “Now how does ;METHOD know which kind of method it is terminating”? Easy: it is told what kind of method it is.

When a method definition is entered, two values are placed on the string stack of the preprocessor, which is the name of the method and its type. When ;METHOD is encountered, this type is evaluated first and only then is decided which kind of code has to be generated.

Of course, at this point it gets very crowded on the preprocessor string stack, so values are discarded as soon as they become superfluous – which means that in some cases they have to be reissued. It is, what it is..

Typing

You can recognize any object by simply examining its constructor - whose execution token is embedded in the object itself. Getting the execution token from a constructor isn't that hard. It's simply a matter of "ticking" the class - which is basically an ordinary 4tH word. Getting the superclass is a bit harder, but if you examine a class definition closely, you'll see that the first word compiled is actually an execution token of the superclass. We can retrieve that value by using @C. I know, it's not clean, but in OOP, what is?

: figure
  [‘] Object [‘] figure [‘] ~figure (~~init)
  drop
;

By examining that definition we can get the execution token of the superclass of the superclass and so on, until we arrive at "Object". Since that object is the only one without a superclass, we can stop looking. Those definitions are so basic, it's not even worth the trouble to spell them out.

Epilogue

Well, my friends, now you know how 2.5K of macros turn an old, imperative language into a brand new object oriented language. If you use PP4TH you won’t even notice that it goes through the preprocessor. If you’re curious you can use the -k switch to examine the horrible code it produces – but that’s not my fault. That’s how object orientation works.

That code, however, has exactly the same properties as ordinary 4tH code. You can embed it or make a standalone executable. But fortunately, OOP doesn’t bloat the compiler itself, keeping it as lean as I originally intended to make it.

4tH compiler Wiki

A Forth compiler with a little difference