[mbackup-devel] Links and discussion
Status: Alpha
Brought to you by:
jo2y
|
From: John H. <Jo...@mw...> - 2000-08-10 01:00:57
|
James and I have been having a interesting thread and he has asked me to summarise for public scrutiny. A Nasa techs' musing on the limitations of backup systems. http://www.computer.org/conferences/meta96/kobler/paper.html The ieee's musings. typical committee! http://www.ieee-sssc.org/projects.html A place that has links to tape specs. Note sidf and mtf http://www.tapeguy.com parallelism in tape systems (a must to implement) http://www.backupcentral.com/parallel.html ================================================ On support for AIO for file access.. http://oss.sgi.com/projects/kaio/ >From reading that, I get the impression that kaio and glibc aio are function call compatible, just a change of #include <aio.h> to <linux/aio.h> and a recompile is needed. So here's my plan. I'll most likely start working aio into the disk writing module as I rewrite it anyway. However, I'll probably use the glibc aio. My reasoning is, I want to get a very high acceptance rate with as many people as I can. I want the a proportional mind share in backups as apache has in webservers. :) If I tell people that they must recompile their kernel to use a backup program it will be harder to get acceptance. That's why it's fortunate that the two implimentations are function call compatiable. When I get around to creating a configure script, we can add the option of compiling against kaio. This seems to be a win for everyone. #ifdef USE_KAIO #include <linux/aio.h> #else #include <aio.h> #endif /* USE_KAIO */ ============================================== Extended information using tar format. If you create a tar archive which has a file repeated, each occurance will be restored. The effect is that the last file overwrites the earlier ones. Therefore, its possible to put the file extended stats into a file and follow it with the data file, (same name). tar will restore the data on the last file. Thus it is backwards compatible. Smart programs (mbackup) spot the duplication and interpret the leading file(s) as containing extended information. ================================================ and an idea from me about tape formats The situation of having multiple tape drives and doing raid /0/1/5 across them or multiple independent streams to independent tape drives or a combination of the above is more likely to be relevant. However its all the same really, just different types of metadata to be recorded. HSM too, i guess. If we can be sufficiently flexible and expandable, there is no problem. Even proposals like that straw man from nasa have glaring errors. Did you see his proposed date format. a. In USA format not ISO. b. fixed length field size. c. (giggle) not y2k compliant. and this was 1996! So, a modest proposal... all the stat information is an xml document. This meets the requirements of standardisation, flexibility and expandabilty. elements like file length are just a string of ascii digits. dates are in ISO standard text form i.e. yyyy-mm-dd hh:mm:ss.ssss........ file names can also be in unicode. we can store extra binary information as a BASE64 encoded mime type. This would be OS/2 EA's, MAC resource forks etc. At the start of the media we can specify extra elements to define device level interleaving or anything else. Then we can lay down the tape in two ways. Our own format, using the xml header directly, or tar, using that clever 2 file system. Within limits this could give backward compatibility, limiting the need to perform format conversion. The objections are too hard, too slow, too verbose(big,inefficient) Too hard. A lot of work goes into xml these days. We just steal it. use the libxml and libunicode libraries. Its just a learning curve. Too slow. I'd be surprised. Its CPU stuff and CPU's are much faster than IO and are getting faster at an accelerating rate. it can't match something like software data compression for slowness. Too verbose. Compared to what? in most cases a k or so. worst case with OS/2 EA's, 100k or so. 40% expansion of base64 over binary. Just not worth worrying about. The advantage is unlimited flexibility and its all printable! Regards John (Hisdad) |