Re: [Dar-support] Multiple slices on LTO6 tape
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Denis C. <dar...@fr...> - 2022-06-11 23:05:51
|
Le 10/06/2022 à 15:36, Petr Skoda a écrit : > Dear Denis, Hi Petr, > > I am afraid we are running in a circle from my first email. But > perhaps now after explaining many things we could find the problem in > my misunderstanding ;-) And maybe the reference to this mail thread > will help other people to understand better the behaviour of dar. sure, > > I will try to summarize what I currently know from the dar docs ( > and some discussions on mailing list. (please try to confirm my > thoughts - perhaps it may be good point to expand on current FAQs a > liitle more) > > 1) dar includes some special sequences (you call them tape marks even > if it is on disk) (see http://dar.linux.free.fr/doc/Features.html > under Data Protection) correct, > > "It can also make use of tape marks that are used inside the backup > for sequential reading as a way to overcome catalog corruption. The > other vital information is the slice layout which is replicated in > each slice and let dar overcome data corruption of that part too. As > a last resort, Dar also proposes a /"lax" mode/ in which the user is > asked questions (like the compression algorithm used, ...) to help > dar recover very corrupted archives and in which, many sanity checks > are turned into warnings instead of aborting the operation." > > So there is even the layout of all slices written in each slice. > Thus> dar knows what is needed (and if missing it asks for it) not exactly: 1/ dar can of course from a single slice access the data and metadata of files stored in that slice. But the path to these files is not duplicated for each entry, this is rather the sequence of entries with the help of pseudo EOD inode (End of Directory) than keep trace of the relative path of each entry. this is something you may have already found in the documentation: http://dar.linux.free.fr/doc/Notes.html#archive_structure This directory tree is this stored in-lined at the global archive/backup level and not per slice. It is also stored in the catalogue which is located at end of backup. When you restore using the relaxed (checking) mode using -al option, you can replace missing slices by a empty file, dar will ask what it needs and if you only have one slice you should be able to restore all files contained in that slice, except the relative path that may be truncated from one or more directories. -al option is not simple to use as it requires you to provide the information missing (if for example you have not the first or last slice, which both contain the compression algorithm used, the archive format, and few other things that must be know in order to access the data). 2/ when it comes to differential backup, there is also something that is missing in the slices: file that have been deleted since the backup of reference was made. This information cannot be evaluated before the backup is completed, this it is located at the end of the backup. Well we could have a first directory scan to detect that, but this would lead to extra disk I/O and could not be inaccurate as the filesystem could change between this first scan and the time the backup is done in any particular directory. > 2) at other places is written (also in a previous mails) that the > catalogue is in the last slice. well, if the last slice is large enough to hold it entirely, else it may span one or more of the latest slices. > But when I want to list its contents in sequential mode - it > requires the first one ! Yes, a dar backup can be read in sequential mode (using inline content) as well as in direct access mode (the default mode) reading the catalogue. Information (metadata and archive header/trailer) is duplicated for that purpose (which also brings more resiliency). > > the -al option is intended for a lax mode, where I expect the ability > to read what is presented from any slice (here is probably my > misunderstanding - I expect the tape marks and special sequences > allow to read the part of the directory on given slice) But when I > try it (see below) it fails.... > > Also in the FAQ question*"* ****Why dar became much slower since > release 2.4.0?"** **** ** ****are introduced escape sequence - IS IT > DIFFERENT from the TAPE MARK ? no, "tape mark" = "escape sequence". The so called "tape mark" just work on the principle of a escape sequences. Thanks for this feedback, I will clarify the FAQ. > Later it is written |*-at*| option, which suppress "tape marks" (just > another name for escape sequences), So probably it is the same. So it > is explained yep > > "Espetially (here is the typo) now fixed, thanks! > to be able to read dar backup through pipes in sequential mode, dar > inserts so-called "escape sequence" to know for example when a new > file starts. This way dar can skip to the next mark upon backup > corruption or if the given file has not to be restored." > > OK - so are those escape sequences used just for jumping over the > bad part or is there written as well the name and attributes of the > file which follows ???? A tape mark is just a special sequence of characters that define the start of something: inode information, EA, FSA, data, CRC... there is a different flavors of tape mark, one for each such function. But as stated above, the inode information only holds the filename not the relative path to that file. > In not - what is written and where to allow the listing of the > catalogue in the sequential mode ???? by the sequence order of inodes, consider that 20 years ago the RAM was much less important as of today and to support the bet I made that it made sense to construct *in-memory* a table of content that would be dropped at the end of the backup/archive, I had to pay attention to use RAM with caution: storing arbitrarily long path for each inode stored would have put too much pressure on RAM for a backup holding many --- possibly small --- files. Even a table of path referred from each entry would have not be adapted. > > > > I have made various experiments - just without the tape Having dir > /home/skoda about 700MB I created 3 slices on disk (not on home but > /mnt... > > dar -c myback -R /home/skoda -s 300M I have three slices > myback.1.dar, myback.2.dar and myback.3.dar Now I will do > > dar -l myback - OK the listing is here fastly dar -l myback -0 (I > expect -0 is --sequential-read - I am lazy to write it ;-) it is the same, no worries, I do the same :) > The dir is given but slowly - I expect now dar uses the tape marks > (special sequences ) to construct the catalogue. that's exactly that, but it requires to sequentially read all the bytes of the backup looking for those escape sequences called tape marks. While reading the catalogue in direct access mode is straight forward and efficient. > > NOW I move 2 slices somewhere - leaving only myback.1.dar available > dar -l myback complains about the last slice normal, in direct mode, dar reads the archive trailer which is located at the end of the last slice. > > dar -l myback The last file of the set is not present in > file:///mnt/work/skoda/ANTARES-2022/DAR-TEST , please provide it. > [return = YES | Esc = NO] > > I will copy it in another terminal window, press enter and the > listing is OK (Please note, I did not need the slice 2 for listing) this means all the catalogue is stored in the last slice. You could also have hidden the first slice, it would have not have been needed neither. > > Now I keep only slice 3 in my working directory dar -l myback gives > immediately the listing precisely > > while the sequential mode is missing the slice 1 dar -l myback -0 > myback.1.dar is required for further operation, please provide the > file. [return = YES | Esc = NO] > > When I copy it there it continues in the listing but after it > requires the slice2 03:37 2022 > DAR/dar-2.7.5/src/libdar/.libs/fichier_libcurl.o ... [Saved][ ] > [---][ 100%][X] -rw-r--r-- skoda tape 105 kio Mon May 30 > 21:03:37 2022 DAR/dar-2.7.5/src/libdar/.libs/i_entrepot_libcurl.o > [Saved][ ] [---][ 100%][X] -rw-r--r-- skoda tape 308 > kio Mon May 30 21:03:38 2022 > DAR/dar-2.7.5/src/libdar/.libs/delta_sig_block_size.o myback.2.dar is > required for further operation, please provide the file. [return = > YES | Esc = NO] > > When added this piece it finishes the listing OK > > So far good - in non sequential mode dar goes to the last slice > and reads all the contents (I mean catalogue) here . In sequential it > needs all slices to to gather information about files and probably > does not read the catalogue on the last slice. no, it does, but only to fetch the files stored as deletes since the reference backup was made (differential backup context). > ------------------------------------------------------------- Then we > have part about -Tslices > > It is very shortly mentioned in the FAQ named *Why cannot I test, > extract file, list the contents of a given slice from a backup?* > > It seems to be useful only when a subtree of the backup is needed to > tell you what slices should be involved. Without the filter - e.g. > -g subdir It seems redundant - as it returns " all displayed files > have their data in slice range ... And ALL slices are said here. Well, it is still useful to know how file's data and metadata are spread over slices. It is not easy to guess as it depends of compression ratio of sparse file handling... But of course, it works also with filters. > > But it works (for me ) only in NON-SEQUENTIAL mode. This implies it > needs ALWAYS the LAST slice. Correct, the reason is due to the fact the sequential reading and the direct access modes are abstracted under the same concept of "catalogue" (i.e. the handling of metadata). The direct access mode creates a catalogue by de-serializing all the information that have been written at the end of the archive, thus it contains the real size used for data, EA and FSA, while the first de-serializes the data, but this one has been written to backup *before* the data could be saved and at that time we the amount of space required to store each file was not known (due to compression or sparse file handling can change that, as well as ciphering). > On disk it is easy dar -l myback -Tslice -g fortape > > It lists all files from "fortape" directory and tells me "All > displayed files have their data in slice range [2]" > > If trying with sequential mode > > dar -l myback -Tslice -0 -g fortape (the fortape is It crashes really??? > > Slice(s)|[Data ][D][ EA ][FSA][Compr][S]|Permission| Filemane > --------+--------------------------------+----------+----------------------------- > > > > Final memory cleanup... > FATAL error, aborting operation: slicing focused output is not > available in sequential-read mode hey! this is not a crash! :) > > (Please note I have all 3 slices available so can do easily dar -l > myback -0 -g fortape and even all backup list dar -l myback -0 > > So my conclusion - Tslices will not work in sequential mode and so > cannot be used on tapes or any pipe through std input output (-) as mentioned in the output message, as well as in documentation (man page), this is a known and not a supported combination... http://dar.linux.free.fr/doc/man/dar.html > > --------------------- Now the objection to the last part of your mail > (below) > > "last, thanks to the lax mode you can, in a way operate data per > slice (listing/restoring/...)," > > I do not understand the term "per slice" And with following statement > I am going back in cycle to previous mails and discussion. > > I CANNOT DO ANYTHING WITH a SINGLE SLICE even in lax mode (-al) IN > CASE IT IS NOT THE LAST OR FIRST Ok let me clarify, then: don't expect the lax mode to be simple to use. This is a last resort method to restore data when your backup is strongly corrupted, but it has the merit to exist. This method will try to recover as much as it can, asking the backup/archive level information that it cannot guess when it has been altered (like the archive format and the compression used for example). You will also have to create empty files of the name of the missing slices, and as already mentioned, you may end with a truncated directory tree where file would restored. For example usr/local/shared/something restored in as local/shared/something if the "usr" metadata was not readable (or in a missing slice). In that way, you can do something with a single slice but it is so limited, that I prefer to not driver users to that direction unless this is the only option to recover some data. In other words, *lax mode feature should not be _planned_ to be used* for normal backup process. > > (in sequential - I need the first to start with in non-sequential > the last file to read from). This is not exact: the addition of -al option (both in direct and sequential modes) only let dar to be less strict upon format incoherence. The process is sequential: # dar -t backtest -al --sequential-read -E "echo reading slice %N" Warning: using insecure memory! reading slice 1 reading slice 2 reading slice 3 -------------------------------------------- 2338 item(s) treated 0 item(s) with error 0 item(s) ignored (excluded by filters) -------------------------------------------- Total number of items considered: 2338 -------------------------------------------- # > > But if using the tapes (or pipelines) I MUST USE sequential mode. So > even the tape with the last slice will not help me. I will simulate > the input from sequential pipe without tape now: > > dar -l - -0 -al <myback.3.dar LAX MODE: Failed to read the archive > header's format version. this is normal, in sequential mode the backup is expected to be read from the beginning. here you provided slice 3. Providing slice 1 instead will work up to the point the end of slice will be met. You could instead also use dar_xform and provide its output to dar's standard input: dar_xform myback - | dar -l - -0 -al > LAX MODE: Please provide the archive format: You can use the table > at > > http://dar.linux.free.fr/pre-release/doc/Notes.html#Dar_version_naming > > > [...] > First the link http://dar.linux.free.fr/pre-release/doc/Notes.html > does not exist ! Good finding! This is now fixed. It will be available in 2.7.6 expected in a couple of weeks, and is already available in 2.7.6.RC2 at https://dar.edrusb.org/dar.linux.free.fr/Interim_releases/ > So I can only geuss what to type if I have 2.,7.5 I tried 6 or 7 . > Always I get the answer is 11.1 but the request from dar does not allow you to provide the digit after the dot, I have fixed that also in 2.7.6.RC2. [...] > > To wrap up: I am still confused what happens here - But I guess that > it is the proof that the individual slices have allegedly thge > escape sequences (Tape marks) embedded, but dar is not able to > operate with them . well it is possible, but not in sequential read mode and you have to replace all other slices than the one you want to play with, by empty files. But as said, this is lax mode, it is not perfect and is here as a last resort solution to recover data. > So IMHO the only hope is the solution of the symlink after you > succeed to repair the behaviour. Yes, I just fixed the problem as described in my previous email. I tested with a disk device (/dev/sdc) what failed before, now work. So I'm enough confident that you will be able to read a single sliced backup from your LTO take. By the way, I have adapted the sanity check for it does not complain if the slice size is reported to be zero when reading in sequential mode (sequential read mode on a block device). So you do not need anymore the "-s 1Y" hack (which was not compatible with writing backup to a pipe by the way). all is in 2.7.6.RC2 (see above to get it). If this works for you I will update the FAQ to present/resurrect the alternative to dar_split by mean of symlinks, pause and sequential-read. > I would suggest to allow the reading of the exact filename of the > slice with -l (not the generic) after using some special option > (as -lslice) that would combine the serial and lax or something > similar. This is strongly not compatible with current design. What I think about is rather to add the possibility to make from a backup as many isolated catalogues as there is slices. Each containing only the files entirely stored in a given slice: As you may have seen, you can load the catalogue from an isolated catalogue as an alternative to the in-line catalogue use during the sequential read mode and the internal catalogue found at end of backup when using direct access mode: dar -c backup --on-fly-isolate cat_backup ... dar -t backup -A cat_backup ... I must take some time to review what's feasible in that area... But already, I guess this would only make sense in direct access mode as it would mean jumping directly slice N, while sequential-read mode would always need to read the backup... sequentially. > > I have also noticed that the output of dar is not compatible with > other linux commends as I cannot redirect the listing to e.g. word > count ls -l backup |wc -l Probably the output is not stdout ... # dar -l tutu | wc -l Warning: using insecure memory! 2341 # dar -l tutu --sequential-read | wc -l Warning: using insecure memory! 2341 # but you are correct, this is true when you use '-' as backup name: Since early days, before --sequential-read mode existed, dar was already able to read data from pipe, but it that mode it uses in that case a pair of pipes and dar_slave at the other ends of the pipes. Thus both stdin and stdout are used by dar to communicate with dar_slave. A new filedescription on the tty is open to output the messages, while filedescriptor 0 and 1 are used with dar_slave. If you add --sequential-read option this modifies the behavior (no dar_slave required and frees the use of stdout, but the operation is less performant precisely because you have to read all the data in sequence). However, this file descriptor selection has to be done very early to avoid polluting the stdout pipe to dar_slave with user messages from dar, message that could be issued before --sequential-read is even parsed on command-line... > It would be nice to have this standard unix filter behaviour. it is, when possible. > > And BTW on the doc page http://dar.linux.free.fr/doc/Tutorial.html > is bad link going to nowhere from section 9 "restoration with dar " thanks again for that, I really appreciate to know that the time spent to document dar/libdar is not completely useless, OK maybe not useful to the reader but at least I tried :D ! > > Best regards Petr > Cheers, Denis |