[Dar-support] Optimal DAR slice size and tape
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Peter W. <Pet...@mq...> - 2016-12-13 10:13:50
|
Dear list, My question particularly concerns the recovery of data from tape (something which does not seem to be much discussed elsewhere, and particularly not with the sort of attention to detail that it has received on this list). A bit like Gour, I'm using LTO tape for long term archive purposes. So I will probably make parity files from the DAR archive slices using PAR or RSC32 (the latter being faster). My primary concern is with the question of what is the ideal slice size, not that I'm looking for a precise figure, but rather guidance on the right considerations in deciding upon such a size. FOR SMALLER SLICES: What I'm particularly interested in are the risks of using large archives or slices, particularly with tape. If there is a case for smaller slices, then DAR's archive slicing feature looks all the more useful and so significant. A) This is the sort of risk that I'm particularly thinking of: https://sourceforge.net/p/dar/mailman/dar-support/?viewmonth=200602&viewday=6 | | But I'm reluctant to trust tar since one special day when I lost all my | precious backuped data due to an archive corruption. DAR with parchive | seems more robust... > On disk, yes, on tapes, that's different: When an area of the medium is > broken (leading to corrupt data stored there), you must copy out to a > sane medium the file that owns the corruption. Parchive alone would > repaire it but fail because trying to write data to that dead medium > area. So you use the cp command, but 'cp' will stop as soon as it will > meet the first i/o error (generated by the dead medium zone). So if the > i/o error is located in the middle of the file you will loose half of > the file's data. Whatever is the redundancy ratio used with Parchive, > there will quite always be a position of the corruption (as close to the > beginning of the file as necessary) that will avoid you copy enough data > with 'cp' to let Parchive recover what is missing. Is the total archive file or the slice the appropriate unit here? If it is the slice, then this would seem to make a good case for increased slicing (smaller slices) which would limit the scope for damage and so provide damage control. B) There are also other advantages to increased slicing (smaller slices): https://sourceforge.net/p/dar/mailman/dar-support/?viewmonth=200602&viewday=6 > Note that if you can have several slices on the same tape, it is then > possible to choose as slice size a divisor of the tape size, this avoids > you having to store a entire tape on disk. The script passed to -E > option can then be done to only ask the user to change the tape each N > time, with N equal to the number of slices on a given tape. C) Moreover, at least PAR seems to be faster with a larger number of slices. FOR LARGER SLICES: D) But PARing individual (and so small) files has its own dangers: https://www.livebusinesschat.com/smf/index.php?topic=4922.0 > But, there is a problem of ignoring small source files in the > archive. MultiPar (and QuickPar) won't search tiny file (smaller than > block size) in an archive file, because it is slow and inefficient in > PAR2. (I plan to solve this problem in PAR3.) E) One is also encouraged to write large files to tape as they can be written more quickly and so prevent slower and so inefficient writing (i.e. shoe-shining). So, it looks like an archive of some form or other has to be involved to avoid the inefficiencies and hazards of having to use individual files and to write to tape efficiently. CONCLUSION But large archives involve the risks discussed earlier that need to be mitigated by parity files. So, should I be using sufficiently small archive slices such that parity files have something like the capacity to recover from losing something like a whole slice, if it has to be recovered from tape? In which case, for instance, %10 slices would seem to indicate the need for at least %10 parity recovery files. I'd appreciate it if anyone can see any faults in or additional considerations that could add to my thinking on this. Thanks, Peter |