Re: [Dar-support] Multiple slices on LTO6 tape
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Moll, R. <me...@rm...> - 2022-05-31 05:42:58
|
Hi all tape users, maybe the use of LTFS [1] makes life easier for all of us. You can use your tape just like an external usb drive. copy, move, delete, etc. There are also OSS packages: - https://github.com/LinearTapeFileSystem/ltfs - https://avpres.net/LTO_LTFS/ and user stories - https://digitensions.home.blog/2019/01/15/technologic/ - https://chiwbaka.com/2020/ltfs-guide-speed-and-tape-longevity/ Right now I create dar slices about 50 GB on SSD and copy them to tape. Catalog und dir-listing is save on backup-pc und after that on the tape, too. Next step would be to compress and write directly on tape. "Problem" is that LTO8 can write about max. 300 MB/s and I want to avoid "shoe shining"... have to test this - https://community.spiceworks.com/topic/1972304-lto-ltfs-moving-multiple-files-may-be-causing-shoe-shining-should-i-zip-first LTO8 auto speed: https://blog.archiware.com/blog/what-is-lto/: "Step #3: Auto Speed Early generations of tape suffered from so-called shoe shining, the stop-and-go of the drive upon the flow of data changes. Recent LTO generations have an auto speed mechanism built-in that lowers the streaming speed if the data flow slows so that the drive can still write at a constant speed. This is also called speed matching and ranges between 112-360MB/s with LTO-8." https://yoyotta.com/help/LTO_FAQ.html: "Source media speed affecting tape write speed LTO drives perform speed matching, so they can run the tape slower to work with slower source drives. Theoretically the minimum speed needed is less than 100MB/s, however with a mixture of file types on a NAS with a 1Gb connection, you will probably not get a suitable sustained speed. The same limitation will apply to slower USB drives. With the higher speed of LTO-7/8/9 media you will need a faster connection to keep the job going. Otherwise the drive will run out of data and the tape will have to keep rewinding and start writing again. You will hear this "shoe shining" behaviour if you listen to the drive. When this happens the average speed will drop drastically. This won't damage the tape, however it will increase drive head wear. Also with tape compression enabled, then the source read speed will need to be a little higher with uncompressed files like ARRIRAW and DPX. In this case use YoYotta to create two sequential jobs. The first copy will be from NAS to fast RAID, then when complete YoYotta will index the new source folder and start a copy from RAID to tape. As described here. Alternatively use faster shared storage or a faster connection to the NAS." Hope this idea helps, cu, Ralf [1] https://en.wikipedia.org/wiki/Linear_Tape_File_System Am Di., 31. Mai 2022 um 02:11 Uhr schrieb Petr Skoda <sez...@se...>: > > Dear Denis, > > after many experiments with different block sizes and writting the slices either with dar_split or dd > using your link idea below, I came to the conclusion that the things DO NOT WORK as you suppose (you admitted you did not have tape unit to test it). > > I have made this experiment (not yet creating multiple slices in fact) > > skoda@lto:~$ dar -c - -R /home/skoda > /dev/nst0 > > The question is the size of the tape block for the dar. By many experiments I have found that > 32kB is not working but 64kB is OK (probably default of debian using the mtio ) > I have also compiled dar_split 2.7.5 as the version in debian (stable) does not have the -b and -r options. > > By other experiments it seems the default for dar_split is 64kB (or 128kB) > namely: > dar -c - -R /home/skoda |dar_split split_output /dev/nst0 > dar_split -b 65536 split_input /dev/nst0 |dar -0 -l - works > while > dar_split -b 32768 split_input /dev/nst0 |dar -0 -l - gives error > "Error reading data: Cannot allocate memory" > > So I have used the block size 262144 bytes which is recommended for LTO tapes in many places on the internet > It seems the multiples of 64KB are OK as block size > > In any way of reading the tape it works when written by dar or dar_split > dar -c - -R /home/skoda |dar_split -b 262144 split_output /dev/nst0 > or directly > dar -c - -R /home/skoda > /dev/nst0 > Even when using dd and dar together > > dar -c - -R /home/skoda |dd bs=256k of=/dev/nst0 > dd if=/dev/nst0 bs=256k | dar -0 -l - works > > as well as this way > dar_split split_input /dev/nst0 | dar -0 -l - > > BUT using the link as you suggest > sudo ln -s /dev/nst0 myback.1.dar > > skoda@lto:~$ dar -0 -l myback > gives an error > myback.1.dar has a bad or corrupted header, please provide the correct file. [return = YES | Esc = NO] > > I must also say that as I am not a root on the tape driving computer, it was quite cumbersome to acchieve the ln -s of /dev/tape > My admin gave me the sudo on ln but not on rm so to remove the links must be done carefully. > Anyway the tape drive interface does not work as expected > So I am still stuck in using multiple slices - and as I have tested even the slices made by dar_split. > Your approach does not work and so it cannot be shown the content of the tape with other slices than the first . In fact even the last slice which should have the header does not work for listing etc ... > > I am afraid that the flexible application of dar for LTO tapes is not possible > (in fact it works the dar_split but when more tapes are written, everything is getting out of control - you cannot check the later tapes. > An waiting for reading them takes about 4hr each. > > IMHO it would be nice to have the dar operable on each slice even if the boundary file written to both files is left out > > I still do not stillsee difference in tape segment written by dar_split and the dar_xfer (dar_split does not know the splitting point in advance while the xfer counts the bytes) > > > best regards, > > Petr Skoda > > > ---------- Původní e-mail ---------- > Od: Denis Corbin <dar...@fr...> > Komu: dar...@li... > Datum: 19. 5. 2022 20:16:58 > Předmět: Re: [Dar-support] Multiple slices on LTO6 tape > > Le 18/05/2022 à 13:20, Petr Skoda a écrit : > > Dear Denis, > > Hi Petr, > > > > > I must say that I am still confused concerning the slices and > > information you have written below and all what I have read in dar doc. > > > > I understand that the slice created by dar_split has special format > > which allows to treat it in a different way than slices done by dar_xform. > > dar_split does not provide any format. It is just used to cut a single > sliced dar backup over several tapes, and to stick these different > fragments for dar to have get back a single (big) slice. > > > > > But then I still do not know how to restore the content when I have > > individual slices on separate tapes. Suppose I have all data for my > > subdirectory only on slice 3 (the last one). You say I need only the > > last slice. > > In your case you created a three slices backup and droped or expect to > drop each slice to a different tape. This is another way of doing than > using dar_split, which have some drawbacks and advantages: > > drawbaks: > as this is a multi-sliced backup, dar expectes different file names for > each slice. Thus you have to play with symlinks to point to the tape and > this require to force dar to pause between slices (-p option) in order > to have the time to: > - rewind the tape > - change the tape > - remove / add a symlink pointing to the tape having the name of the > next slice > - backup process does not directly write to tape, you need to store at > least one slice at a time on local disk, copying to tape, removing it > from disk (or keep it if you hav enough disk storage), then continue > with the next slice. unsing dar_split, you can directly send the backup > to tape. > > advantage: > - you do not need dar_split > - if for some reason and I/O succeeds while writing a slice to tape, you > don't have to restart the whole backup, slice already on tape are fine, > you just need to restry writing the failed slice (eventually to a > different tape). > > For the rest, this is equal as what's in the FAQ: you still need to read > with --sequential-read option, you can use -E option to automate what's > possible rather or in addition to -p option. > > > > So I insert tape with slice 3 and want to extract the catalogue for > > further usage . > > How I will do dar -C - something like > > dd if=/dev/nst0 bs=256k | dar -C mycat -A - seems not to work. > > assuming a three sliced backup, this should work (I have no tape drive > to test it): > > ln -s /dev/nst0 backup.3.dar > dar -C backup_isolated_cat -A backup -z --sequential-read > > If you get a message about fadvise not available on the device, you need > to compile dar with: > ./configure --disable-fadvise > > > > > I know that I must use the generic name of the slices (without the > > .3.dar) to work on archive. But if it is read from a pipe after dd - how > > to do it ? > > Using symlink, as shown above. As you see this is not very confortable > to use that way (playing with symlinks in addition to playing with tapes). > > But better converting this 3 slices backup to a single sliced backup > (using dar_xform). During the converstion, you can use dar_split to send > the resulting singles sliced backup directly to tape, and thus avoiding > storing twice the data on local disk (single sliced backup, and three > sliced backup). This is what I proposed in my previous email. > > > > > > Futhermore you have written in a doc that a special control sequences > > are interspersed acrros the tape to allow the reconstruction but the -0 > > (sequential) mode must be used... So why I may not do this to get the > > content of a particular tape with a slice ? > > first because dar expect the backup to start with a backup header, that > is only contained at the beginning of the first slice. More precisely, > this information is also duplicated at the end of the backup and is used > when reading the backup without --sequential-mode (so called direct mode > access). > > However dar has a feature that can let you recover at most of corrupted > backup (the -alax mode), but it is painful as you have to provide by > hand the few information contained in that header (which is here > missing) based on the dar version you were using for the backup. The > archive format has evolved over time and current dar version is still > able to read backup created with version 1.0.0 more than twenty years > ago. However it must know which version (which format) is the backup > created after, in order to read it properly. > > So you can read the content of a slice taken from a multi-sliced backup > using both --sequential-read and --alax options. But that's painful and > does not handle files that are located on two continuous slices (this is > quite improbable that a slice end matches the end of a saved file and > that the next slice start with a new file). > > > > > I suppose that still the FAQ should explain in a detail the questiion: > > > > "I have splitted the data by xform (e.g. after netcat ) to several > > slices . Each of them was written to a separate tape. How I will extract > > the backup ?" > > It could add this, but I will probably not, because to my point of view > this is not the best way of using dar with tape. What you did is pretty > logical and you could not guess of the other way (dar_split) due to the > lack of FAQ about that (my bad). Don't you agree that using dar_split > instead of multi-sliced backup only brings advantages? If you explain me > some drawback of using dar_split in that context, I will reconsider my > point on that :) > > to my point of view in your case, if you don't want to remake the whole > backup, convert it to a single sliced backup with dar_xform and output > it to tapes using dar_split > > assuming your have backup.1.dar backup.2.dar and backup.3.dar available > this is done this way: > > dar_xform backup - | dar_split split_output /dev/nst0 > > > > > All of your answers is about dar_split but you say it must be used to > > create the (special) slices. What should I do if did not do it just > > used the dar_xform of simply dar with -s and -S option of create ? > > not sure to understand your question > > > How to extract catalog from such tapes ? > > we have answered this question above > > > How to use sequential mode. > > you can use --sequential-read even with multi-sliced backup. But as > already said, you need to mimic filename using symlinks for dar finds > the slices it expectes. > > > > > Sorry for my ignorance, but I really have spent a lot of time reading > > various docs on dar web and mail list but still have very weak > > understanding what to do ... > > not worries, documentation is always perfectible, you input are valuable > (see this new FAQ that was missing, I guess if you had it available > since the beginning, things would have been simpler to you, right?) > > > > > So far it looks that I cannot do anything with multiple slices on tape > > except of feeding the drive with all tapes in sequential mode . > > No you can, just playing with symlink to the tape device as seen previously. > > > I am not even verify it the particular slice is correct (as the -l does > > not work on other then first slice) > > --sequential-mode is also available with sliced backup, thus suitable to > tape as you did, at the cost of symlink manipulation to emulate filenames. > > > Extracting the catalogue from the > > last one is also problematic > > is the only answer - use the dar_split (so I have to start with backups > > from the scratch ? I have already written ten or more tapes with slices > > from multiple backups (filesystems) > > > > > > > > Best regards, Petr Skoda > > > > > > ---------- Původní e-mail ---------- > > Od: Denis Corbin <dar...@fr...> > > Komu: dar...@li... > > Datum: 13. 5. 2022 18:38:45 > > Předmět: Re: [Dar-support] Multiple slices on LTO6 tape > > > > > > > > > > > > I have saved each slice to one tape using dd if=datas.1.dar > > > of=/dev/nst0 bs=256k etc... > > > I still have the 3 slices on my staging disk (taking almost 7TB) . > > But I > > > am not sure how to list the content , check the consistency and > > finally > > > extract without needing such large space in the future. > > > > in the mode you have been using, if you want to list the backup > > content, > > dar will only need the last slice. This will lead you to read the whole > > 3rd slice from tape. > > > > > > how the command should look like (with dd and pipe - I need dd because > > of larger block) ? > > > > > > > > > > Better you can use instead an isolated catalogue of > > the backup on tape (something you can do on-fly or afterward). An > > isolated catalogue is usually small an does not require to be backed up > > to tape as you can recreate one from a backup at any time. > > > > example to create a isolated catalogue: > > dar -C isolated -A backup -z ... > > > > > > Yes but then I need still to have all slices on the disk (called > > backup.1.dar , backup.2.dar etc.... - 10TB or more) > > > > > > > > > > > > > > If you want to extract a single file from backup, dar will ask the last > > slice, then the slice where the file/tape to restore resided. But if > > you > > restore with the help of an isolated catalogue, dar will only need the > > slice where is located the file to restore. > > > > > > How will it get the right info if reading from tape with the other than > > first slice ?- the extract will not accept the data from tape - (except > > the first)? > > > > I.e. when using dd if=/dev/nst0 s=256k|dar -0 -x -g something it worked > > on tape with backup.1.dar > > > > > > > > > > > > > > > > > > example to restore with the help of a catalogue > > dar -x backup -A isolated ... > > > > it is written many times in docs - but here I need the name of backup > > (so it must be on the disk and in fact all slices ) > > > > > > > > > > > > > > > > > I would like > > > simply to restore only some files in the future getting instruction > > > which tape to use and than extract Ideally using some pipes combined > > > with dd if=/dev/nst0 bs=256k | dar something -x - > > > However dar requires all slices together - I am not able to use only > > > e.g. datas.2.dar to list only its contents. > > > > You could better use dar+dar_split instead, if this suits your need. > > The > > advantage is that it will not need you any extra storage to restore or > > list the backup, but will act a bit like tar, reading the tapes from > > the > > first toward the last up to the point the file's metadata and data to > > restore are reached. > > > > > > ok - so dar_split is something different which creates some header and > > catalogues for each slices/tape ? > > > > > > > > > > > > > > Note that you cannot use dar_split at reading time if you have not used > > it at creation time. when using dar+dar_split, dar creates a single > > slice, slice that dar_split (as you can guess from its name) splits > > over > > different tapes. Thus, at reading time, dar expects a single slice and > > not the concatenation of two or more slices, which is what dar_splits > > does at reading time (concatenating the content of several tapes). > > > > > > so can I read from tape with dar_split (using -0 as well )or not) > > > > > > > > > > However you can convert a splited backup to a single sliced backup > > using > > dar_xform and then use dar_split: > > > > dar_xform backup - | dar_split split_output /dev/tape > > > > note that if you had an isolated catalogue from 'backup' it stays a > > valid isolated catalog for the single sliced backup generated by > > dar_xform, that dar_split has written to several tapes. > > > > > > Would you mind to write a concrete examples of using /dev/nst0 or > > /dev/tape) > > > > instead of name of the backup ? > > > > Combination with pipe on dd ? > > > > > > > > > > > > > > When starting with dar I expected that in sequential mode with some > > > flags (even the -al does not work here) I will be able to scan > > what is > > > on given slice and get something from it. > > > > you can get file content per slide using the -Tslice option while > > listing an multi-sliced backup (or an isolated catalogue from this > > backup). > > > > > > In fact it did not help in reading the single slice. It only tells me > > (while having all slices on disk ) at which slices the subdire pointed > > by -g is written > > > > > > > > > > But you lose this ability when using dar with dar_split as from > > dar stand point, there is only one slice. > > > > > But now having more tapes I am even not able to check what is on the > > > tape (if it is slice 2 I am not able simply to use > > > dd if=/dev/nst0 bs=256k |dar -0 -l - to see just for > > orientation part > > > of the listing of content. > > > > there is not really a table of content per slice, a sliced backup is > > still a coherent backup with an table of content at the end of the > > slice > > set. > > > > > > I understand - but it seems also not to have header recognized by dar > > (except of the first) > > > > > > > > > > Dar is not expecting to have all slices available at any time, you can > > use -p option to pause after each (n) created slice(s) and do what is > > needed, like move the produced slices to tape then remove it, before > > having dar continuing its work. At reading time, dar will ask for the > > missing slice and pause, you can then obtain it from tape and let dar > > > > > > Ok - but it means I need to feed all slices (tape by tape) until finding > > the right one. > > > > Even if I know that my subdir is on a slice number n>1 > > > > > > > > > > > > > > with dar_split you do not need mbuffer if you just want to rate limit > > the throughput (see its -r option) > > > > > > yes but mbuffer is printing the progress the data rates etc ... And the > > buffer filling is IMHO important for preventing the tape shoe shining . > > > > I do not want o make slower the feeding the tape what probably does the > > -r ) - so in writting it is usefull the if the tape drive does not cope > > with a speed of feeding data. But on LTO on SAS controller the speed of > > writting is fast and the tape waits for feeding and e.g. compressing) > > the data so the buffer (large several GB) is needed not the let the tape > > to stop. > > > > In fact in my setup the hard disk is slower than tape speed (as it is > > not on dedicated SAS) > > > > But I will try to make some tests to prove this my subjective feeling. > > > > > > > > > > |