[srvx-devel] draft: srvx-2.0 database abstraction interface

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Feedback, if you have any :)

I do not know how clear this will be to people besides me; if it is
not clear, or if you have concerns about what it can do, let me know
where.

Assuming nobody finds serious problems in it, I'll start implementing
it this weekend.

-- Entrope

Database Abstraction
--------------------

srvx-2.0 uses (will use) a database abstraction layer to hide the
structural and representation differences between different supported
database backends, and to make it easier to manage the database
behavior at a high level.  This document describes the behavior and
interfaces of that layer -- both what it exposes to the higher levels
of srvx, and what it expects from the database backends.

Database Model
--------------

The top level of the database consists of tables.  These are named
mappings of keys to rows.  Each row has one primary key column, other
columns, and child tables.  A column has a name, and data of one of
several types:
  - unsigned integer
  - datetime
  - string
  - string list
  - child table (of a specific type)
Within a given table, each column contains the same type of data in
all rows (the special value 'null' may exist in any column except the
primary key column; whether it occurs, and the meaning if it does, is
table-specific).
A child table is distinguished by having a parent row in another
table; rows in top-level tables do not have parents.  For relational
databases, this usually means the child table has additional columns
in the primary key to identify the parent row.
Child objects cannot look up their parents directly, although if they
do not have a parent object pointer, they must have a list of parent
keys (to know where their children are and to write themselves out).

Schema Representation
---------------------

Each table type is represented by a descriptor and control object in
srvx.  These are created by specifying the table name, column names
and types, primary key column name, and factory to load objects from
the database store.  These type descriptors are created by a factory
method on the database backend object.

Object Loading
--------------

The table type descriptor object can load a database object, given the
child primary key and either parent object or full list of parent
keys.

When a row is loaded from the database, all its child rows may be
loaded in the same query (whether or not is specified in the load
call, and defaults to loading children), but its parent (if one
exists) is not loaded.  The object is entered into a per-table and
database-wide cache.  The representation passed to the object loading
factory should be similar to the "struct record_data" in srvx-1.x.

Object Storing
--------------

A database object can be marked dirty (and probably should be marked
dirty by mutator methods) and moved to the dirty list in the database.
The base database object class has a (virtual) method to write the
object out, using an interface similar to the saxdb write interface in
srvx-1.x.

The database backend must also support transaction levels, and not
permanently write dirty objects until the transaction is committed.
If the transaction (or a parent transaction) is rolled back, the
object must be reverted to its pre-transaction state.

[Q: how will rollback interact with adding or dropping rows?  Probably
just reference the added/dropped rows in the transaction record.]

Caching
-------

Each table type contains a map of loaded objects.  The whole database
contains LRU lists of clean and dirty objects.  The desired size of
these lists can be tuned at runtime.  The database can be told to
reload its cache (for example, if a third party updates the data
store).

Backends
--------

This abstraction layer is designed to efficiently support three
specific backends:
 - a plain text backend that can read srvx-1.x format recdb files (and
   write similarly formatted files -- just without "" for strings that
   are clearly tokens),
 - Berkeley/Sleepycat DB databases,
 - a SQL backend (that knows how to do RFC1459 casemapping).

The major provisos for the backends are:
 - the plain text backend must load and store the entire database at
   once, and thus caches the entire thing,
 - plain text and Sleepycat DBs are vulnerable to in-process corruption,
 - SQL backend is susceptible to high IPC traffic and serialization cost.