[Alephmodular-devel] Copy-on-write in practice

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I'd like to take one idea from my monstrous idea dump, copy-on-write 
(COW) game objects, and talk a little more about it.

Remember, the goal is to be able to split off into a fake (predicted) 
game-state for one or more ticks, then later return to the original 
(real) game-state, with the game-update logic being essentially the same 
for predictive updates and real updates (and as similar to the current 
code as possible, for practical reasons).

(Sounds could be sticky, but I'll put that problem off till later.)

I see (at least) three basic options.

OPTION 1.  Put all the COW functionality into the lookup routines

An object a refers to another object b by its index (and, implicitly, 
its type) like it does currently.

When a wants to read b's data structure fields, it calls 
get_btype_by_index(bindex); and reads the returned structure.

When a wants to write b's data structure fields, it calls 
get_writable_btype_by_index(bindex); and may read or write the returned 
structure.

The lookup routines go something like:

btype*
get_writable_btype_by_index(bindex) {
     if(game_state_mode == predictive_mode) {
         if(predictive_btypes[bindex] == NULL)
             predictive_btypes[bindex] = 
shallow_copy_btype(real_btypes[bindex]);
         return predictive_btypes[bindex];
     }
     else
         return real_btypes[bindex];
}

const btype*
get_btype_by_index(bindex) {
     if(game_state_mode == predictive_mode && 
predictive_btypes[bindex] != NULL)
         return predictive_btypes[bindex];
     else
         return real_btypes[bindex];
}

void
discard_predictive_btypes() {
     for(int i = 0; i < predictive_btype_count; i++) {
         if(predictive_btypes[i] != NULL) {
             dispose_btype(predictive_btypes[i]);
             predictive_btypes[i] = NULL;
         }
     }
     predictive_btype_count = real_btype_count;
}

This only does the testing and lookup when asking for another object, so 
there's not a terribly great performance hit probably.  OTOH there's a 
great potential for disaster, e.g. if we're in predictive mode, and one 
routine acquires a pointer via get_foo_by_index(k), calls {another 
routine which acquires a pointer via get_writable_foo_by_index(k) and 
makes some changes}, and then reads fields of its foo with its old 
pointer, it's working with stale data.

OPTION 2.  Put COW functionality in objects and field accessors

template <typename tEmulatedType>
class COWObject {
public:
     void discardPredictiveVersion() { delete predictive_self;  
predictive_self = NULL; }
protected:
     const tEmulatedType* self() { return (game_state_mode == 
predictive_mode && predictive_self != NULL) ? predictive_self : 
real_self; }
     tEmulatedType* writable_self();
     tEmulatedType* real_self;
     tEmulatedType* predictive_self;
};

template <typename tEmulatedType> tEmulatedType*
COWObject::writable_self() {
     if(game_state_mode == real_mode)
         return real_self;
     else {
         if(predictive_self == NULL) {
             predictive_self = real_self->shallow_copy();
             CentralAuthority()->registerPredictiveObject(this);
         }
         return predictive_self;
     }
}

class Player : public COWObject<PlayerState> {
public:
     int getValue() { return self()->value; }
     void setValue(int inValue) { writable_self()->value = inValue; }
};

(The CentralAuthority holds a list of all the objects with predictive 
state, and tells them all to discard_predictive_version() when the game 
logic asks it to.)

This way code that uses game objects essentially just uses them, reading 
or writing individual fields in a natural way (that also allows for 
additional 'hooking').  Each such use incurs some overhead for locating 
the appropriate object, but there's little to no chance for 
inconsistency as in the example above.

OPTION 3.  Arrange game-state structures explicitly in memory and 
bulk-copy

In this scheme, all the game-state code is grouped together in a fairly 
tightly-packed, well-defined region of memory.  All access to game-state 
objects is indexed off a base address.

On EnterPredictiveMode(), all the game-state code is copied in bulk 
(with memcpy, some kind of hardware blitter, operating-system 
virtual-memory-system mark-page-as-COW support, etc.) to another chunk 
of memory.  The base address for game-state-object indexing (see above) 
is changed to point at this new block.  Bulk copying *should* be 
(significantly?) quicker than the too-slow copy-the-whole-game-state 
approach I took, which effectively produced a saved-game (without 
packing the fields, but essentially using the same code paths 
nonetheless).  Since the state-change won't happen in the middle of a 
game-state update, there's no chance for inconsistency.  All the 
overhead is taken up in the bulk copy; actual use of the objects is 
essentially identical to the current scheme, so there's no additional 
overhead in reading/writing fields or asking for objects.

On ExitPredictiveMode(), of course, the copy is simply deallocated and 
the base pointer set back to the "real_mode" chunk of memory.  Cheap 
cheap, fun fun.

Note that in general I think netgames would change mode upwards (from 
real to predictive) once per rendered frame probably, whereas 
single-player games and films would not need to.  OTOH the same 
mechanism (basically) would probably be used for between-frame 
interpolation (for smoothing out the animation, enabling frame-rates 
higher than 30fps), which would need a mode-change upwards once per 
rendered frame (potentially in addition to the predictive mode change in 
a netgame) in all circumstances (unless the machine is not even keeping 
up with 30fps, in which case it's nearly pointless to interpolate 
between the 30 ticks per second we're already computing).

I think this summarizes the characteristics of these alternatives:

Approach - Runtime overhead (time) - Runtime overhead (memory) - Changes 
required to existing code - Chances for (insidious) bugs
(1) - Low - Low - Low - High
(2) - Medium - Low - High - Low
(3) - High (potentially) - Medium - Low - Low

Thoughts?  Maybe I'll try to estimate the game-state size, and measure 
how long it would take to bulk-copy it on my machines, to get *some* 
idea of the feasibility.

Woody