[Tapioca-devel] Archive format

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've put the preliminary architecture documentation up at
http://tapioca.sourceforge.net . That should give you a good idea of how
this is structured. Aside from the 'pudding', the rest of the architecture
should be somewhat familiar, except for the parts inflicted by Java (like
the tape server being multi-threaded rather than multi-process).

Now it's time to decide on the archive format. We need to get this right
because changing it after we've started making backups will be a major
pain in the %$#@@#.

Some criteria:

1. Must be able to multiplex (and demultiplex) multiple streams into the
same tape file. This indicates that the input needs to be blocked in a
structured way, rather than just being a plain old stream of data, and
that each block needs to be tagged with a stream ID.

2. Archives should have a header that includes the stream ID's (and
descriptive labels) of every stream multiplexed into them, otherwise doing
a manual restore of a multiplexed archive (e.g. to restore the central
authority if it crashes) will be horribly difficult. When restoring by
hand, a stream ID (or rather index -- e.g. we can tell it "--stream=1" and
it knows to only dump stuff labeled with stream id=a00523134.342) can be
used and it'll strip out everything except that stream on the restore.

3. The tape format should not require doing a MT_TELL for every bloody
block written to tape, only for blocks that actually need it (i.e, blocks
that contain the beginning of a piece of data logged into the database).
This tends to indicate that blocks need tagging with a "type" field.

4. The format should be able to handle two things other than raw
  data blocks:
   a) producing location information suitable for logging into the
    central authority's location database for use in future restores,
    and
   b) holding any OS-specific data needed to fully restore the file.

5. The stream format will have to hold data about what kind of writer
produced the data in the file, so that the file logger can properly
account for the differences in display format and pass that data upstream
to the user interface. We don't want to force Unix filename format onto
Windows or Mac or etc.! Similarly, if we're backing up a database file
dump stream (one possible data source) we don't want to have to pretend
that it contains Unix-structured data, and we need to know it came from
a database stream dumper rather than from a filesystem dumper, so that
when we go to restore it we know what restorer to use!
    So each type of data stream creator will need a unique creator ID of
some sort to tell us what kind of widget created the data stream, and this
gets put into the header so that we can grab it and know what to restore
this data stream with.

6. For volume changes, the full header information should be replicated
  on the new volume, along with what volume we're working on etc. so that
  if we have a tape that is a volume 2, we have more of a chance of
  associating it with the correct volume 1 if we have to do this by
  hand.

7. Fixed-size blocks, or variable-sized blocks? Fixed-sized blocks, like
 'tar' uses, are easy to deal with, and can be easily packed into
 larger buffers (as long as said larger buffers are a multiple of the
  blocksize in length).  However, each block adds overhead. If the
  block size is too small, overhead becomes too much of a percentage of
  the block. If the block size is too large, then we have too much
  wasted space at the end of the block.

  Variable-sized blocks could be used, but we could require that these
  be packed into a fixed-size buffer of some large size (perhaps
  64K or 128K) such that each buffer begins with a block and no block
  spans buffers. This is a pain, but results in less wasted space and
  thus better performance in the end. Note that if we limit the
  variable-sized blocks to 32k in size, we can represent the size of the
  block with only 2 bytes in the block's header.

8. Checksumming streams: We should probably only worry about checksumming
  buffer-sized chunks of data, not individual blocks of structured data.
  Setup time for the CRC calculations can thus be reduced, as can the
  overhead of the CRC checksum itself.

9. I think Mr. Fish mentioned that we probably want an "end of file" block
in file streams so that we know we have reached the end of a file.  This
simplifies some programming, I guess. Did I misread the message?

Okay, I think this is enough to think about. I am especially curious to
know what you think about the notion of putting variable-sized blocks into
bigger buffer-sized blocks. I think this solves many problems (we never
really know how much OS-specific data is going to be in file headers, for
example), but is somewhat more complex than fixed-size blocks like 'tar',
and yes, there is still some overhead in some cases (if we don't have
enough space at the end of a buffer for a block, that space is wasted).

Comments?

Once we have tossed around the criteria, we can come up with some possible
tape layouts, and then I'll be happy to write a spec for it and put it
into the CVS archive. I'm currently working on a spec/RFC type template
that we can use for that (it's not in the CVS archive yet though).

- -- 
Eric Lee Green                             mailto:er...@ba...
               BadTux: http://www.badtux.org
  GnuPG public key at http://badtux.org/eric/eric.gpg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE7QSwj3DrrK1kMA04RAvIvAJ0bz1uc+MWd8fHLd4BGJngMl8lA7QCeNb8L
NT6WVcEtcAOYFbasYOEFCcU=
=2S0P
-----END PGP SIGNATURE-----