|
From: Richard F. <rj...@fi...> - 2001-07-05 00:05:00
|
On Mon, 2 Jul 2001, Eric Lee Green wrote: > I've put the preliminary architecture documentation up at > http://tapioca.sourceforge.net . That should give you a good idea of how > this is structured. Aside from the 'pudding', the rest of the architecture > should be somewhat familiar, except for the parts inflicted by Java (like > the tape server being multi-threaded rather than multi-process). Looks good. One question I have though is the implementation of the tapicom protocol. I have just updated my java reference library, and see that java has a built in RPC mechanism called RMI (Remote Method Invocation). It's simpler than CORBA or COM, although it only works between java objects. It appears that RMI can be easily integrated with any type of socket, including an SSL-socket. Maybe we should consider using RMI instead? The advantage here to me is that we don't have to write a lot of code to format, write, read, and parse data for every operation we perform, while also trying to figure out the error detection, reporting, and handling mechanisms. If we want to add a user to the database, we just call something like: result = tapica.exports.NewUser(authObj, userObj) If the call fails, it raises an exception that can get handled by the client. Anyway, I think it's worth considering, since it makes the data and error handling much easier. > Now it's time to decide on the archive format. We need to get this right > because changing it after we've started making backups will be a major > pain in the %$#@@#. Agreed!!!! > Some criteria: > > 1. Must be able to multiplex (and demultiplex) multiple streams into the > same tape file. This indicates that the input needs to be blocked in a > structured way, rather than just being a plain old stream of data, and > that each block needs to be tagged with a stream ID. Yep. Some proprosed requirements: 1. Archive volumes are written in fixed blocks of a configurable size. Minimum IO block size is 16k. 2. Each volume begins with a volume header block that describes the volume and all still-active archive streams being multiplexed to that archive. Note that this means that additional streams cannot be added to an archive that is in process. However, it doesn't guarantee that this specific volume actually contains any data for that archive stream. For example, we could say that this archive will consist of 8 streams, but only 3 can be written concurrently. This would allow the other 5 streams to be written as previous streams are completed. 2. Each IO block contains data for exactly one archive stream. 3. Each IO block contains a header that specifies it's archive stream ID, sequence number, checksum, etc. 4. The payload section of each IO block starts with a byte that indicates a structure type: data stream header, data stream resource, data stream data, etc. This is followed by the structure of the indicated type, and any data it carries. 5. Within each IO block the structures are variable length, ensuring efficient space usage and performance. The downside to variable length structures and data is handling corruption. With a fixed-length, BRU-style format, if a file data block was corrupted, we could (and did!) just write the data section out to the file, and note the error. On a file header block, we could have just invented a file name, and wrote out the appropriate data section. But with variable length records, we can't even trust that the encoded length is correct, and we don't know for certain if there is another stream header structure in the block, and if so, where. We either have to write very complex (and error-prone) code to try and make sense of the corrupted data, or throw the whole block away and move on. One thing we might want to consider is some kind of ECC encoding for the structure headers. I don't have any clear idea of how much ECC to do, or even, where it should go in the format. I also don't have any clear idea of the performance impact of ECC. > 3. The tape format should not require doing a MT_TELL for every bloody > block written to tape, only for blocks that actually need it (i.e, blocks > that contain the beginning of a piece of data logged into the database). > This tends to indicate that blocks need tagging with a "type" field. Hmm, how do we handle the catalog rebuild (reading the archive) case? We don't know *if* we need to do an MT_TELL or not at the time we *need* to do the MT_TELL, before reading the block. I suppose we could calculate the tape block size when we open the volume (a=MT_TELL, read_block, b=MT_TELL, tape_block_size = b-a). Then we could read the block, and it contained stream headers, fudge the QFA position. > 4. The format should be able to handle two things other than raw > data blocks: > a) producing location information suitable for logging into the > central authority's location database for use in future restores, > and > b) holding any OS-specific data needed to fully restore the file. Right. And of course, we also want to be able to restore the data portion of a stream (file or otherwise) on a different OS. > 5. The stream format will have to hold data about what kind of writer > produced the data in the file, so that the file logger can properly > account for the differences in display format and pass that data upstream > to the user interface. We don't want to force Unix filename format onto > Windows or Mac or etc.! Yep. Yet another lesson we learned! In the archive, paths should probably be encoded into some platform-independent form, so that it can be reconstructed for the platform we are restoring to. But we still want an indicator of the original platform, for catalog and display purposes. Oh, and let's not forget, the converters from the independant path to the native path format will need to check for and handle invalid characters in the path. > Similarly, if we're backing up a database file > dump stream (one possible data source) we don't want to have to pretend > that it contains Unix-structured data, and we need to know it came from > a database stream dumper rather than from a filesystem dumper, so that > when we go to restore it we know what restorer to use! Yep. I'm finding it useful to think about streams as having 'names', not paths. > So each type of data stream creator will need a unique creator ID of > some sort to tell us what kind of widget created the data stream, and this > gets put into the header so that we can grab it and know what to restore > this data stream with. I've been thinking a lot about this, and am having difficulty. On the one hand, we want a general ID that indicates very generally what this stream is (directory, file, pipe data, database, etc), and a general way of accessing it for cross platform support. On the other hand, we also want to be able to identify a file as coming from an ext2 filesystem, so we can backup and restore the extended ext2 bits. In other cases, platform and filesystem specific ACL's need to be handled. So it looks like we need at least 3 different indicator ID's for stream headers. The first (1 byte?) to indicate the type of stream (directory, file, pipe output, command output, etc), the second (1 byte?) to indicate the platform of the given type (Windows, Mac, Unix for directories and files; Oracle or MySQL for database streams, etc). The third (2 bytes?) further classifies the type of stream based upon it's original writer object. This way, if we are running on an NT system where we don't have the Unix file object class, we can use more general file object class to process the data. In other words, the lowlevel specific file object readers and writers don't have to worry about whether the host platform supports them or not. They will only be available on the platforms that support them. They also don't have to worry about processing data produced by other file objects, since they would only get data that they (or their subclasses) created. File Stream Object class heirarchy: GenericFile - Available everywhere. Processes all types of file data |-- NTFile - Available on NT. Processes file data for all NT filesystems | |-NTFSFile - Availble on NT, when backing up or restoring to NTFS filesystems | |-- UnixFile - .... > 6. For volume changes, the full header information should be replicated > on the new volume, along with what volume we're working on etc. so that > if we have a tape that is a volume 2, we have more of a chance of > associating it with the correct volume 1 if we have to do this by > hand. > > 7. Fixed-size blocks, or variable-sized blocks? Fixed-sized blocks, like > 'tar' uses, are easy to deal with, and can be easily packed into > larger buffers (as long as said larger buffers are a multiple of the > blocksize in length). However, each block adds overhead. If the > block size is too small, overhead becomes too much of a percentage of > the block. If the block size is too large, then we have too much > wasted space at the end of the block. > > Variable-sized blocks could be used, but we could require that these > be packed into a fixed-size buffer of some large size (perhaps > 64K or 128K) such that each buffer begins with a block and no block > spans buffers. This is a pain, but results in less wasted space and > thus better performance in the end. Note that if we limit the > variable-sized blocks to 32k in size, we can represent the size of the > block with only 2 bytes in the block's header. I think variable sized works best, as we don't worry about wasting space, which is really the biggest overhead in fixed size blocks. And 2 bytes gives us a maximum 'chunk' size of 64k. Note that we can pack variable sized chunks into a fixed size IO block. For example, assume the following 128k fixed IO blocks (these sizes are arbitrary): IO header - 32-bytes Stream header(1) - 196-bytes (encodes name length, resource length, and data lengths, etc) - Stream name - 34-bytes - Stream resource - 132-bytes Stream data(1) - 2345-bytes - header 12-bytes - data 2333-bytes Stream header(2) - 96-bytes - name - 42-bytes - resource - 23-bytes Stream data(2) - 48401-bytes ..... Stream data(4) - 836-bytes - header 12-bytes - data 824-bytes # END OF BLOCK 2 here, stream 4 not finished # Start of BLOCK 3, IO Header - 32 bytes Stream data(4) - 65504-bytes - header 12-bytes - data - 65492-bytes Stream data(4) - 1804-bytes # last block of stream 4 - header 12-bytes - data - 1792-bytes .... Think about the archive API like a filesystem streams API. The file object just reads/writes data, and doesn't worry about how that get's stored on the media. This doesn't really cause any API problems. In fact, it ensures that the only thing that can know about the archive format is the archive object. We just need to figure out whether the archive object processes stream objects, or if the stream objects process themselves via the archive object. > 8. Checksumming streams: We should probably only worry about checksumming > buffer-sized chunks of data, not individual blocks of structured data. > Setup time for the CRC calculations can thus be reduced, as can the > overhead of the CRC checksum itself. Yes. This should be done on an IO block, and stored in the IO block header. It is useful to be able to tell the other ojects whether or not the block this data came from validated or not, so they know how much trust to put into it, but we don't want to have several checksums floating around. > > 9. I think Mr. Fish mentioned that we probably want an "end of file" block > in file streams so that we know we have reached the end of a file. This > simplifies some programming, I guess. Did I misread the message? I don't think that's required. We can flag a file EOF when we see a new stream header. It also means we can pack the stream data all the way to the end of the IO block. If we want, we can just add a flags byte to the stream data structure that indicates that "this is the last data block". > Okay, I think this is enough to think about. I am especially curious to > know what you think about the notion of putting variable-sized blocks into > bigger buffer-sized blocks. I think this solves many problems (we never > really know how much OS-specific data is going to be in file headers, for > example), but is somewhat more complex than fixed-size blocks like 'tar', > and yes, there is still some overhead in some cases (if we don't have > enough space at the end of a buffer for a block, that space is wasted). I like it. We actually waste less space than the fixed-chunk case. BRU wasted an average of 512-bytes per file for small files, and ~850 bytes per file for larger files. Packing multiple things into a fixed IO block prevents this. There will be some space wasted: for example, we can't (more specifically, don't want to!) split a stream header across an IO buffer boundary. But if we are writing stream data, we can adjust/buffer the data to exactly fill the buffer, and then just start the next IO buffer with the leftover stream data (and a new stream data header indicating the size). > Comments? We need some new terms. We are using 'streams' in the sense of archive streams and interplexing, and 'streams' in the sense of data streams to be archived. How about calling them something like d-streams and a-streams. A d-stream is a stream of data to be archived. An a-stream is an archive stream, and contains one or more d-streams. An archive is made up of one or more a-streams. -- Richard Fish, Unix/Linux Software Engineer, rj...@fi... |