Re: presistent storage

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> * Bruno Haible <oe...@py...t> [2004-08-03 13:53:35 +0200]:
>
>> E.g., you can export a generic function PERSISTENT-STORE which would
>> take two arguments: an object to be stored and a storage function of two
>> arguments: a numeric ID and a data vector which will be called once for
>> the object and each its component and a function PERSISTENT-RETRIEVE
>> which will take two arguments: a numeric ID and a function which returns
>> a data vector given a numeric ID.
>
> Good proposal: I can understand it without understanding databases.
> Still, can you say: what's the extent of the ID?

the duration of the call to PERSISTENT-STORE.
note that PERSISTENT-STORE is like WRITE in its recursiveness.

> Does it persist across Lisp sessions? Or only in one Lisp session? Or
> only as long as the object has not been GCed?

if you can make PERSISTENT-STORE non-consing (here big fixnums would
come handy :-), then it would be enough for the the IDs to persist until
the next GC.  Then, of course, the data vector will need to be allocated
on the C stack (or with malloc) and the storage/retrieval functions will
need to be GC-safe.

Let me be more specific:

ID: an integer (or a bit-vector? or a system object), probably an
    immediate object.
    [expect it to be converted and saved to disk as a uint64]

data-vector: an array of IDs of components of an object and other information,
    sufficient to recover the object.
    E.g., a PATHNAME's data vector will contain IDs for HOST, NAME,
    DIRECTORY, DEVICE, TYPE, VERSION.
    SYMBOL's data vector will have a IDs for NAME and PACKAGE
    PACKAGE's data vector will have an ID for NAME
    SIMPLE-STRING's data vector is its character sequence

use-supplied functions:

(STORER object-ID type-ID data-vector)
    saves the mapping from object-ID to type-ID+data-vector
    (type-ID can be thought as the first component of data-vector and
    helps RETRIEVER decide how to interpret the binary data in the
    data-vector)

(RETRIEVER object-ID)
    returns the type-ID and data-vector identified by object-ID in the
    back-end storage facility

(PERSISTENT-STORE object storer)
    calls STORER on object and all its compoments, recursively

(PERSISTENT-RETRIEVE object-ID retriever)
    calls RETRIEVER on object-ID and reconstructs the original object
    from the type-ID and data-vector, recursively.

note that both PERSISTENT-STORE and PERSISTENT-STORE have to worry about
GC and ID: PERSISTENT-STORE has to generate IDs and make sure they do
not change until the exit from the function, while PERSISTENT-RETRIEVE
has to make sure that it does not create multiple objects for the same
ID.  The latter can be accomplished by stipulating that RETRIEVER marks
in the backend that this object-ID has already been handled and saves
the address of the corresponding new object next to the data-vector, so
that subsequent calls to RETRIEVER with this object-ID would result in
the object address instead of the data-vector.

>> (how to make numeric IDs GC-invariant is a challenge)
>
> Can't you make the mapping from object to ID through a (perhaps weak)
> hash-table with EQ test?

no.
think of zillions of subobjects - you cannot create yet another HUGE
table in CLISP memory, at least not always.

OTOH, this is negotiable :-)

-- 
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.honestreporting.com>
Only adults have difficulty with child-proof caps.