Re: [Dar-support] Multiple slices on LTO6 tape
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Denis C. <dar...@fr...> - 2022-05-31 17:41:24
|
A nice one! Thanks Ralf for your feedback ;^) Cheers, Denis Le 31/05/2022 à 07:12, Moll, Ralf a écrit : > Hi all tape users, > > maybe the use of LTFS [1] makes life easier for all of us. You can use > your tape just like an external usb drive. copy, move, delete, etc. > There are also OSS packages: > - https://github.com/LinearTapeFileSystem/ltfs > - https://avpres.net/LTO_LTFS/ > and user stories > - https://digitensions.home.blog/2019/01/15/technologic/ > - https://chiwbaka.com/2020/ltfs-guide-speed-and-tape-longevity/ > > Right now I create dar slices about 50 GB on SSD and copy them to > tape. Catalog und dir-listing is save on backup-pc und after that on > the tape, too. > Next step would be to compress and write directly on tape. "Problem" > is that LTO8 can write about max. 300 MB/s and I want to avoid "shoe > shining"... have to test this > - https://community.spiceworks.com/topic/1972304-lto-ltfs-moving-multiple-files-may-be-causing-shoe-shining-should-i-zip-first > LTO8 auto speed: > https://blog.archiware.com/blog/what-is-lto/: > "Step #3: Auto Speed > > Early generations of tape suffered from so-called shoe shining, the > stop-and-go of the drive upon the flow of data changes. Recent LTO > generations have an auto speed mechanism built-in that lowers the > streaming speed if the data flow slows so that the drive can still > write at a constant speed. This is also called speed matching and > ranges between 112-360MB/s with LTO-8." > > > https://yoyotta.com/help/LTO_FAQ.html: > "Source media speed affecting tape write speed > > LTO drives perform speed matching, so they can run the tape slower to > work with slower source drives. Theoretically the minimum speed needed > is less than 100MB/s, however with a mixture of file types on a NAS > with a 1Gb connection, you will probably not get a suitable sustained > speed. The same limitation will apply to slower USB drives. > With the higher speed of LTO-7/8/9 media you will need a faster > connection to keep the job going. Otherwise the drive will run out of > data and the tape will have to keep rewinding and start writing again. > You will hear this "shoe shining" behaviour if you listen to the > drive. When this happens the average speed will drop drastically. This > won't damage the tape, however it will increase drive head wear. > Also with tape compression enabled, then the source read speed will > need to be a little higher with uncompressed files like ARRIRAW and > DPX. > In this case use YoYotta to create two sequential jobs. The first copy > will be from NAS to fast RAID, then when complete YoYotta will index > the new source folder and start a copy from RAID to tape. As described > here. Alternatively use faster shared storage or a faster connection > to the NAS." > > Hope this idea helps, > > cu, Ralf > > [1] https://en.wikipedia.org/wiki/Linear_Tape_File_System > > > Am Di., 31. Mai 2022 um 02:11 Uhr schrieb Petr Skoda <sez...@se...>: >> >> Dear Denis, >> >> after many experiments with different block sizes and writting the slices either with dar_split or dd >> using your link idea below, I came to the conclusion that the things DO NOT WORK as you suppose (you admitted you did not have tape unit to test it). >> >> I have made this experiment (not yet creating multiple slices in fact) >> >> skoda@lto:~$ dar -c - -R /home/skoda > /dev/nst0 >> >> The question is the size of the tape block for the dar. By many experiments I have found that >> 32kB is not working but 64kB is OK (probably default of debian using the mtio ) >> I have also compiled dar_split 2.7.5 as the version in debian (stable) does not have the -b and -r options. >> >> By other experiments it seems the default for dar_split is 64kB (or 128kB) >> namely: >> dar -c - -R /home/skoda |dar_split split_output /dev/nst0 >> dar_split -b 65536 split_input /dev/nst0 |dar -0 -l - works >> while >> dar_split -b 32768 split_input /dev/nst0 |dar -0 -l - gives error >> "Error reading data: Cannot allocate memory" >> >> So I have used the block size 262144 bytes which is recommended for LTO tapes in many places on the internet >> It seems the multiples of 64KB are OK as block size >> >> In any way of reading the tape it works when written by dar or dar_split >> dar -c - -R /home/skoda |dar_split -b 262144 split_output /dev/nst0 >> or directly >> dar -c - -R /home/skoda > /dev/nst0 >> Even when using dd and dar together >> >> dar -c - -R /home/skoda |dd bs=256k of=/dev/nst0 >> dd if=/dev/nst0 bs=256k | dar -0 -l - works >> >> as well as this way >> dar_split split_input /dev/nst0 | dar -0 -l - >> >> BUT using the link as you suggest >> sudo ln -s /dev/nst0 myback.1.dar >> >> skoda@lto:~$ dar -0 -l myback >> gives an error >> myback.1.dar has a bad or corrupted header, please provide the correct file. [return = YES | Esc = NO] >> >> I must also say that as I am not a root on the tape driving computer, it was quite cumbersome to acchieve the ln -s of /dev/tape >> My admin gave me the sudo on ln but not on rm so to remove the links must be done carefully. >> Anyway the tape drive interface does not work as expected >> So I am still stuck in using multiple slices - and as I have tested even the slices made by dar_split. >> Your approach does not work and so it cannot be shown the content of the tape with other slices than the first . In fact even the last slice which should have the header does not work for listing etc ... >> >> I am afraid that the flexible application of dar for LTO tapes is not possible >> (in fact it works the dar_split but when more tapes are written, everything is getting out of control - you cannot check the later tapes. >> An waiting for reading them takes about 4hr each. >> >> IMHO it would be nice to have the dar operable on each slice even if the boundary file written to both files is left out >> >> I still do not stillsee difference in tape segment written by dar_split and the dar_xfer (dar_split does not know the splitting point in advance while the xfer counts the bytes) >> >> >> best regards, >> >> Petr Skoda >> >> >> ---------- Původní e-mail ---------- >> Od: Denis Corbin <dar...@fr...> >> Komu: dar...@li... >> Datum: 19. 5. 2022 20:16:58 >> Předmět: Re: [Dar-support] Multiple slices on LTO6 tape >> >> Le 18/05/2022 à 13:20, Petr Skoda a écrit : >>> Dear Denis, >> >> Hi Petr, >> >>> >>> I must say that I am still confused concerning the slices and >>> information you have written below and all what I have read in dar doc. >>> >>> I understand that the slice created by dar_split has special format >>> which allows to treat it in a different way than slices done by dar_xform. >> >> dar_split does not provide any format. It is just used to cut a single >> sliced dar backup over several tapes, and to stick these different >> fragments for dar to have get back a single (big) slice. >> >>> >>> But then I still do not know how to restore the content when I have >>> individual slices on separate tapes. Suppose I have all data for my >>> subdirectory only on slice 3 (the last one). You say I need only the >>> last slice. >> >> In your case you created a three slices backup and droped or expect to >> drop each slice to a different tape. This is another way of doing than >> using dar_split, which have some drawbacks and advantages: >> >> drawbaks: >> as this is a multi-sliced backup, dar expectes different file names for >> each slice. Thus you have to play with symlinks to point to the tape and >> this require to force dar to pause between slices (-p option) in order >> to have the time to: >> - rewind the tape >> - change the tape >> - remove / add a symlink pointing to the tape having the name of the >> next slice >> - backup process does not directly write to tape, you need to store at >> least one slice at a time on local disk, copying to tape, removing it >> from disk (or keep it if you hav enough disk storage), then continue >> with the next slice. unsing dar_split, you can directly send the backup >> to tape. >> >> advantage: >> - you do not need dar_split >> - if for some reason and I/O succeeds while writing a slice to tape, you >> don't have to restart the whole backup, slice already on tape are fine, >> you just need to restry writing the failed slice (eventually to a >> different tape). >> >> For the rest, this is equal as what's in the FAQ: you still need to read >> with --sequential-read option, you can use -E option to automate what's >> possible rather or in addition to -p option. >> >> >>> So I insert tape with slice 3 and want to extract the catalogue for >>> further usage . >>> How I will do dar -C - something like >>> dd if=/dev/nst0 bs=256k | dar -C mycat -A - seems not to work. >> >> assuming a three sliced backup, this should work (I have no tape drive >> to test it): >> >> ln -s /dev/nst0 backup.3.dar >> dar -C backup_isolated_cat -A backup -z --sequential-read >> >> If you get a message about fadvise not available on the device, you need >> to compile dar with: >> ./configure --disable-fadvise >> >>> >>> I know that I must use the generic name of the slices (without the >>> .3.dar) to work on archive. But if it is read from a pipe after dd - how >>> to do it ? >> >> Using symlink, as shown above. As you see this is not very confortable >> to use that way (playing with symlinks in addition to playing with tapes). >> >> But better converting this 3 slices backup to a single sliced backup >> (using dar_xform). During the converstion, you can use dar_split to send >> the resulting singles sliced backup directly to tape, and thus avoiding >> storing twice the data on local disk (single sliced backup, and three >> sliced backup). This is what I proposed in my previous email. >> >> >>> >>> Futhermore you have written in a doc that a special control sequences >>> are interspersed acrros the tape to allow the reconstruction but the -0 >>> (sequential) mode must be used... So why I may not do this to get the >>> content of a particular tape with a slice ? >> >> first because dar expect the backup to start with a backup header, that >> is only contained at the beginning of the first slice. More precisely, >> this information is also duplicated at the end of the backup and is used >> when reading the backup without --sequential-mode (so called direct mode >> access). >> >> However dar has a feature that can let you recover at most of corrupted >> backup (the -alax mode), but it is painful as you have to provide by >> hand the few information contained in that header (which is here >> missing) based on the dar version you were using for the backup. The >> archive format has evolved over time and current dar version is still >> able to read backup created with version 1.0.0 more than twenty years >> ago. However it must know which version (which format) is the backup >> created after, in order to read it properly. >> >> So you can read the content of a slice taken from a multi-sliced backup >> using both --sequential-read and --alax options. But that's painful and >> does not handle files that are located on two continuous slices (this is >> quite improbable that a slice end matches the end of a saved file and >> that the next slice start with a new file). >> >>> >>> I suppose that still the FAQ should explain in a detail the questiion: >>> >>> "I have splitted the data by xform (e.g. after netcat ) to several >>> slices . Each of them was written to a separate tape. How I will extract >>> the backup ?" >> >> It could add this, but I will probably not, because to my point of view >> this is not the best way of using dar with tape. What you did is pretty >> logical and you could not guess of the other way (dar_split) due to the >> lack of FAQ about that (my bad). Don't you agree that using dar_split >> instead of multi-sliced backup only brings advantages? If you explain me >> some drawback of using dar_split in that context, I will reconsider my >> point on that :) >> >> to my point of view in your case, if you don't want to remake the whole >> backup, convert it to a single sliced backup with dar_xform and output >> it to tapes using dar_split >> >> assuming your have backup.1.dar backup.2.dar and backup.3.dar available >> this is done this way: >> >> dar_xform backup - | dar_split split_output /dev/nst0 >> >>> >>> All of your answers is about dar_split but you say it must be used to >>> create the (special) slices. What should I do if did not do it just >>> used the dar_xform of simply dar with -s and -S option of create ? >> >> not sure to understand your question >> >>> How to extract catalog from such tapes ? >> >> we have answered this question above >> >>> How to use sequential mode. >> >> you can use --sequential-read even with multi-sliced backup. But as >> already said, you need to mimic filename using symlinks for dar finds >> the slices it expectes. >> >>> >>> Sorry for my ignorance, but I really have spent a lot of time reading >>> various docs on dar web and mail list but still have very weak >>> understanding what to do ... >> >> not worries, documentation is always perfectible, you input are valuable >> (see this new FAQ that was missing, I guess if you had it available >> since the beginning, things would have been simpler to you, right?) >> >>> >>> So far it looks that I cannot do anything with multiple slices on tape >>> except of feeding the drive with all tapes in sequential mode . >> >> No you can, just playing with symlink to the tape device as seen previously. >> >>> I am not even verify it the particular slice is correct (as the -l does >>> not work on other then first slice) >> >> --sequential-mode is also available with sliced backup, thus suitable to >> tape as you did, at the cost of symlink manipulation to emulate filenames. >> >>> Extracting the catalogue from the >>> last one is also problematic >>> is the only answer - use the dar_split (so I have to start with backups >>> from the scratch ? I have already written ten or more tapes with slices >>> from multiple backups (filesystems) >>> >>> >>> >>> Best regards, Petr Skoda >>> >>> >>> ---------- Původní e-mail ---------- >>> Od: Denis Corbin <dar...@fr...> >>> Komu: dar...@li... >>> Datum: 13. 5. 2022 18:38:45 >>> Předmět: Re: [Dar-support] Multiple slices on LTO6 tape >>> >>> >>> >>>> >>>> I have saved each slice to one tape using dd if=datas.1.dar >>>> of=/dev/nst0 bs=256k etc... >>>> I still have the 3 slices on my staging disk (taking almost 7TB) . >>> But I >>>> am not sure how to list the content , check the consistency and >>> finally >>>> extract without needing such large space in the future. >>> >>> in the mode you have been using, if you want to list the backup >>> content, >>> dar will only need the last slice. This will lead you to read the whole >>> 3rd slice from tape. >>> >>> >>> how the command should look like (with dd and pipe - I need dd because >>> of larger block) ? >>> >>> >>> >>> >>> Better you can use instead an isolated catalogue of >>> the backup on tape (something you can do on-fly or afterward). An >>> isolated catalogue is usually small an does not require to be backed up >>> to tape as you can recreate one from a backup at any time. >>> >>> example to create a isolated catalogue: >>> dar -C isolated -A backup -z ... >>> >>> >>> Yes but then I need still to have all slices on the disk (called >>> backup.1.dar , backup.2.dar etc.... - 10TB or more) >>> >>> >>> >>> >>> >>> >>> If you want to extract a single file from backup, dar will ask the last >>> slice, then the slice where the file/tape to restore resided. But if >>> you >>> restore with the help of an isolated catalogue, dar will only need the >>> slice where is located the file to restore. >>> >>> >>> How will it get the right info if reading from tape with the other than >>> first slice ?- the extract will not accept the data from tape - (except >>> the first)? >>> >>> I.e. when using dd if=/dev/nst0 s=256k|dar -0 -x -g something it worked >>> on tape with backup.1.dar >>> >>> >>> >>> >>> >>> >>> >>> >>> example to restore with the help of a catalogue >>> dar -x backup -A isolated ... >>> >>> it is written many times in docs - but here I need the name of backup >>> (so it must be on the disk and in fact all slices ) >>> >>> >>> >>> >>> >>> >>> >>>> I would like >>>> simply to restore only some files in the future getting instruction >>>> which tape to use and than extract Ideally using some pipes combined >>>> with dd if=/dev/nst0 bs=256k | dar something -x - >>>> However dar requires all slices together - I am not able to use only >>>> e.g. datas.2.dar to list only its contents. >>> >>> You could better use dar+dar_split instead, if this suits your need. >>> The >>> advantage is that it will not need you any extra storage to restore or >>> list the backup, but will act a bit like tar, reading the tapes from >>> the >>> first toward the last up to the point the file's metadata and data to >>> restore are reached. >>> >>> >>> ok - so dar_split is something different which creates some header and >>> catalogues for each slices/tape ? >>> >>> >>> >>> >>> >>> >>> Note that you cannot use dar_split at reading time if you have not used >>> it at creation time. when using dar+dar_split, dar creates a single >>> slice, slice that dar_split (as you can guess from its name) splits >>> over >>> different tapes. Thus, at reading time, dar expects a single slice and >>> not the concatenation of two or more slices, which is what dar_splits >>> does at reading time (concatenating the content of several tapes). >>> >>> >>> so can I read from tape with dar_split (using -0 as well )or not) >>> >>> >>> >>> >>> However you can convert a splited backup to a single sliced backup >>> using >>> dar_xform and then use dar_split: >>> >>> dar_xform backup - | dar_split split_output /dev/tape >>> >>> note that if you had an isolated catalogue from 'backup' it stays a >>> valid isolated catalog for the single sliced backup generated by >>> dar_xform, that dar_split has written to several tapes. >>> >>> >>> Would you mind to write a concrete examples of using /dev/nst0 or >>> /dev/tape) >>> >>> instead of name of the backup ? >>> >>> Combination with pipe on dd ? >>> >>> >>> >>> >>>> >>>> When starting with dar I expected that in sequential mode with some >>>> flags (even the -al does not work here) I will be able to scan >>> what is >>>> on given slice and get something from it. >>> >>> you can get file content per slide using the -Tslice option while >>> listing an multi-sliced backup (or an isolated catalogue from this >>> backup). >>> >>> >>> In fact it did not help in reading the single slice. It only tells me >>> (while having all slices on disk ) at which slices the subdire pointed >>> by -g is written >>> >>> >>> >>> >>> But you lose this ability when using dar with dar_split as from >>> dar stand point, there is only one slice. >>> >>>> But now having more tapes I am even not able to check what is on the >>>> tape (if it is slice 2 I am not able simply to use >>>> dd if=/dev/nst0 bs=256k |dar -0 -l - to see just for >>> orientation part >>>> of the listing of content. >>> >>> there is not really a table of content per slice, a sliced backup is >>> still a coherent backup with an table of content at the end of the >>> slice >>> set. >>> >>> >>> I understand - but it seems also not to have header recognized by dar >>> (except of the first) >>> >>> >>> >>> >>> Dar is not expecting to have all slices available at any time, you can >>> use -p option to pause after each (n) created slice(s) and do what is >>> needed, like move the produced slices to tape then remove it, before >>> having dar continuing its work. At reading time, dar will ask for the >>> missing slice and pause, you can then obtain it from tape and let dar >>> >>> >>> Ok - but it means I need to feed all slices (tape by tape) until finding >>> the right one. >>> >>> Even if I know that my subdir is on a slice number n>1 >>> >>> >>> >>> >>> >>> >>> with dar_split you do not need mbuffer if you just want to rate limit >>> the throughput (see its -r option) >>> >>> >>> yes but mbuffer is printing the progress the data rates etc ... And the >>> buffer filling is IMHO important for preventing the tape shoe shining . >>> >>> I do not want o make slower the feeding the tape what probably does the >>> -r ) - so in writting it is usefull the if the tape drive does not cope >>> with a speed of feeding data. But on LTO on SAS controller the speed of >>> writting is fast and the tape waits for feeding and e.g. compressing) >>> the data so the buffer (large several GB) is needed not the let the tape >>> to stop. >>> >>> In fact in my setup the hard disk is slower than tape speed (as it is >>> not on dedicated SAS) >>> >>> But I will try to make some tests to prove this my subjective feeling. >>> >>> >>> >>> >>> > |