Re: [mbackup-devel] xml as a tape header: the cons

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> The only thing I have against using xml as a tape format is the extra tape
> space that it would use. I wrote this quick example:
>
> <?xml version="1.0" ?>
> <mbackup:header>
> <mbackup:date>966968837</mbackup:date>
> <mbackup:filename>/usr/local/bin/foobar</mbackup:filename>
> <mbackup:hostname>uhura.midnightlinux.com</mbackup:hostname>
> </mbackup:header>
>
> I'm not sure if that's true valid xml, but I think it's close enough for
> this example. I also have this more traditional header format:
>
> 96696883724uhura.midnightlinux.com21usr/local/bin/foobar
>
> 9 digit time, 2 digit next field length, hostname, 2 digit next field
> length, filename.
>
> Both give the same information. The xml one is 215 bytes and the other one
> is 58 bytes.
>
> I also gathered some data that is typical of the data we use at work:
>
> [root@cadillac round5]# du -a|wc -l
>  306050
> [root@cadillac round5]# du -s
> 4070060 .
>
> 306,050 files using about 4 gigs of space.
> With the xml header, there is 65M of header data to label 4gig of data.
> With the other header, there is 17M of header data to label 4gig of data.
>
> Given this is a simple header and doesn't have all the data a full header
> might have, but I feel that the xml header will grow faster than the other
> header, even if we don't use the mbackup namespace part.
>
> Just something to consider...
>
> -james

I've just checked out our web/ftp server.
for  /etc, which is classic for small files, 6390k/1333 files = 4.8K on average.

over the whole system
5.5gb/ 106639 files = 52.2kb

I was thinking of a much more verbose and detailed header so lets say 1k per
file. nah lets say 2k.
So for a full backup this is going to take 213Mb.

But thats trivial, not worth worrying over! For a DDS-3 thats an extra 3mins of
backup time.
Of course for a drive supporting compression, its even smaller.

The advantages of expandability and flexibility remain huge.

Implementation wise, the client would create the XML header with the data and
pass it,
locally or over a network , to the server process.

When it comes into the server its parsed into a DOM object.  (I dont see any
reason why we need a validating
parser though). Then a pointer to the DOM gets passed along with the data to the
filters.
One possibility might be a compression filter that examines to DOM to see if the
data is in a compressed native format
(With Netware 4+ for example, you can open a compressed file and keep it
compressed).
It might then compress the data and add a new entity to the DOM

<mbackup:compression>gzip</mbackup:compression>

and at the point where it goes out to the device it gets converted to an XML
document.

Regards