Re: [Dar-support] Multiple slices on LTO6 tape

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

A nice one! Thanks Ralf for your feedback ;^)

Cheers,
Denis

Le 31/05/2022 à 07:12, Moll, Ralf a écrit :
> Hi all tape users,
> 
> maybe the use of LTFS [1] makes life easier for all of us. You can use
> your tape just like an external usb drive. copy, move, delete, etc.
> There are also OSS packages:
> - https://github.com/LinearTapeFileSystem/ltfs
> - https://avpres.net/LTO_LTFS/
> and user stories
> - https://digitensions.home.blog/2019/01/15/technologic/
> - https://chiwbaka.com/2020/ltfs-guide-speed-and-tape-longevity/
> 
> Right now I create dar slices about 50 GB on SSD and copy them to
> tape. Catalog und dir-listing is save on backup-pc und after that on
> the tape, too.
> Next step would be to compress and write directly on tape. "Problem"
> is that LTO8 can write about max. 300 MB/s and I want to avoid "shoe
> shining"... have to test this
> - https://community.spiceworks.com/topic/1972304-lto-ltfs-moving-multiple-files-may-be-causing-shoe-shining-should-i-zip-first
> LTO8 auto speed:
> https://blog.archiware.com/blog/what-is-lto/:
> "Step #3: Auto Speed
> 
> Early generations of tape suffered from so-called shoe shining, the
> stop-and-go of the drive upon the flow of data changes. Recent LTO
> generations have an auto speed mechanism built-in that lowers the
> streaming speed if the data flow slows so that the drive can still
> write at a constant speed. This is also called speed matching and
> ranges between 112-360MB/s with LTO-8."
> 
> 
> https://yoyotta.com/help/LTO_FAQ.html:
> "Source media speed affecting tape write speed
> 
> LTO drives perform speed matching, so they can run the tape slower to
> work with slower source drives. Theoretically the minimum speed needed
> is less than 100MB/s, however with a mixture of file types on a NAS
> with a 1Gb connection, you will probably not get a suitable sustained
> speed. The same limitation will apply to slower USB drives.
> With the higher speed of LTO-7/8/9 media you will need a faster
> connection to keep the job going. Otherwise the drive will run out of
> data and the tape will have to keep rewinding and start writing again.
> You will hear this "shoe shining" behaviour if you listen to the
> drive. When this happens the average speed will drop drastically. This
> won't damage the tape, however it will increase drive head wear.
> Also with tape compression enabled, then the source read speed will
> need to be a little higher with uncompressed files like ARRIRAW and
> DPX.
> In this case use YoYotta to create two sequential jobs. The first copy
> will be from NAS to fast RAID, then when complete YoYotta will index
> the new source folder and start a copy from RAID to tape. As described
> here. Alternatively use faster shared storage or a faster connection
> to the NAS."
> 
> Hope this idea helps,
> 
> cu, Ralf
> 
> [1] https://en.wikipedia.org/wiki/Linear_Tape_File_System
> 
> 
> Am Di., 31. Mai 2022 um 02:11 Uhr schrieb Petr Skoda <sez...@se...>:
>>
>> Dear Denis,
>>
>> after many experiments with different block sizes and writting the slices either with dar_split or dd
>> using your link idea below, I came to the conclusion that the things DO NOT WORK as you suppose (you admitted you did not have tape unit to test it).
>>
>> I have made this experiment (not yet creating multiple slices in fact)
>>
>> skoda@lto:~$ dar  -c - -R /home/skoda > /dev/nst0
>>
>> The question is the size of the tape block for the dar. By many experiments I have found that
>> 32kB is not working but 64kB is OK (probably default of debian using the mtio )
>> I have also compiled dar_split 2.7.5 as the version in debian (stable) does not have the -b and -r options.
>>
>> By other experiments it seems the default for dar_split is 64kB (or 128kB)
>> namely:
>> dar  -c - -R /home/skoda  |dar_split split_output /dev/nst0
>> dar_split -b 65536 split_input /dev/nst0 |dar -0 -l -    works
>> while
>> dar_split -b 32768 split_input /dev/nst0 |dar -0 -l - gives error
>> "Error reading data: Cannot allocate memory"
>>
>> So I have used the block size 262144 bytes which is recommended for LTO tapes in many places on the internet
>> It seems the multiples of 64KB are OK as block size
>>
>> In any way  of reading the tape it works when written by dar or dar_split
>> dar  -c - -R /home/skoda |dar_split -b 262144 split_output /dev/nst0
>> or directly
>> dar  -c - -R /home/skoda > /dev/nst0
>> Even when using dd and dar together
>>
>> dar  -c - -R /home/skoda |dd bs=256k of=/dev/nst0
>> dd if=/dev/nst0 bs=256k | dar  -0 -l -  works
>>
>> as well as this way
>> dar_split split_input /dev/nst0  | dar  -0 -l -
>>
>> BUT using the link as you suggest
>> sudo ln -s /dev/nst0 myback.1.dar
>>
>> skoda@lto:~$ dar -0 -l myback
>> gives an error
>> myback.1.dar has a bad or corrupted header, please provide the correct file. [return = YES | Esc = NO]
>>
>> I must also say that as I am not a root on the tape driving computer, it was quite cumbersome to acchieve the ln -s of /dev/tape
>> My admin gave me the sudo on ln but not on rm so to remove the links must be done carefully.
>> Anyway the tape drive interface does not work as expected
>> So I am still stuck in using multiple slices - and as I have tested even the slices made by dar_split.
>> Your approach does not work and so it cannot be shown the content of the tape with other slices than the first . In fact even the last slice which should have the header does not work  for listing  etc ...
>>
>> I am afraid that the flexible application of dar for LTO tapes is not possible
>> (in fact it works the dar_split but when more tapes are written, everything is getting out of control - you cannot check the later tapes.
>> An waiting for reading them takes about 4hr each.
>>
>> IMHO it would be nice to have the dar operable on each slice even if the boundary file written to both files is left out
>>
>> I still do not stillsee difference in tape segment written by dar_split and the dar_xfer  (dar_split does not know the splitting point in advance while the xfer counts the bytes)
>>
>>
>> best regards,
>>
>> Petr Skoda
>>
>>
>> ---------- Původní e-mail ----------
>> Od: Denis Corbin <dar...@fr...>
>> Komu: dar...@li...
>> Datum: 19. 5. 2022 20:16:58
>> Předmět: Re: [Dar-support] Multiple slices on LTO6 tape
>>
>> Le 18/05/2022 à 13:20, Petr Skoda a écrit :
>>> Dear Denis,
>>
>> Hi Petr,
>>
>>>
>>> I must say that I am still confused concerning the slices and
>>> information you have written below and all what I have read in dar doc.
>>>
>>> I understand that the slice created by dar_split has special format
>>> which allows to treat it in a different way than slices done by dar_xform.
>>
>> dar_split does not provide any format. It is just used to cut a single
>> sliced dar backup over several tapes, and to stick these different
>> fragments for dar to have get back a single (big) slice.
>>
>>>
>>> But then I still do not know how to restore the content when I have
>>> individual slices on separate tapes. Suppose I have all data for my
>>> subdirectory only on slice 3 (the last one). You say I need only the
>>> last slice.
>>
>> In your case you created a three slices backup and droped or expect to
>> drop each slice to a different tape. This is another way of doing than
>> using dar_split, which have some drawbacks and advantages:
>>
>> drawbaks:
>> as this is a multi-sliced backup, dar expectes different file names for
>> each slice. Thus you have to play with symlinks to point to the tape and
>> this require to force dar to pause between slices (-p option) in order
>> to have the time to:
>> - rewind the tape
>> - change the tape
>> - remove / add a symlink pointing to the tape having the name of the
>> next slice
>> - backup process does not directly write to tape, you need to store at
>> least one slice at a time on local disk, copying to tape, removing it
>> from disk (or keep it if you hav enough disk storage), then continue
>> with the next slice. unsing dar_split, you can directly send the backup
>> to tape.
>>
>> advantage:
>> - you do not need dar_split
>> - if for some reason and I/O succeeds while writing a slice to tape, you
>> don't have to restart the whole backup, slice already on tape are fine,
>> you just need to restry writing the failed slice (eventually to a
>> different tape).
>>
>> For the rest, this is equal as what's in the FAQ: you still need to read
>> with --sequential-read option, you can use -E option to automate what's
>> possible rather or in addition to -p option.
>>
>>
>>> So I insert tape with slice 3 and want to extract the catalogue for
>>> further usage .
>>> How I will do dar -C  - something like
>>> dd if=/dev/nst0 bs=256k | dar -C  mycat -A - seems not to work.
>>
>> assuming a three sliced backup, this should work (I have no tape drive
>> to test it):
>>
>> ln -s /dev/nst0 backup.3.dar
>> dar -C backup_isolated_cat -A backup -z --sequential-read
>>
>> If you get a message about fadvise not available on the device, you need
>> to compile dar with:
>> ./configure --disable-fadvise
>>
>>>
>>> I know that I must use the generic name of the slices (without the
>>> .3.dar) to work on archive. But if it is read from a pipe after dd - how
>>> to do it ?
>>
>> Using symlink, as shown above. As you see this is not very confortable
>> to use that way (playing with symlinks in addition to playing with tapes).
>>
>> But better converting this 3 slices backup to a single sliced backup
>> (using dar_xform). During the converstion, you can use dar_split to send
>> the resulting singles sliced backup directly to tape, and thus avoiding
>> storing twice the data on local disk (single sliced backup, and three
>> sliced backup). This is what I proposed in my previous email.
>>
>>
>>>
>>> Futhermore you have written in a doc that a special control sequences
>>> are interspersed acrros the tape to allow the reconstruction but the -0
>>> (sequential) mode must be used...  So why I may not do this to get the
>>> content of a particular tape with a slice ?
>>
>> first because dar expect the backup to start with a backup header, that
>> is only contained at the beginning of the first slice. More precisely,
>> this information is also duplicated at the end of the backup and is used
>> when reading the backup without --sequential-mode (so called direct mode
>> access).
>>
>> However dar has a feature that can let you recover at most of corrupted
>> backup (the -alax mode), but it is painful as you have to provide by
>> hand the few information contained in that header (which is here
>> missing) based on the dar version you were using for the backup. The
>> archive format has evolved over time and current dar version is still
>> able to read backup created with version 1.0.0 more than twenty years
>> ago. However it must know which version (which format) is the backup
>> created after, in order to read it properly.
>>
>> So you can read the content of a slice taken from a multi-sliced backup
>> using both --sequential-read and --alax options. But that's painful and
>> does not handle files that are located on two continuous slices (this is
>> quite improbable that a slice end matches the end of a saved file and
>> that the next slice start with a new file).
>>
>>>
>>> I suppose that still the FAQ should explain in a detail the questiion:
>>>
>>> "I have splitted the data by xform (e.g. after netcat ) to several
>>> slices . Each of them was written to a separate tape. How I will extract
>>> the backup ?"
>>
>> It could add this, but I will probably not, because to my point of view
>> this is not the best way of using dar with tape. What you did is pretty
>> logical and you could not guess of the other way (dar_split) due to the
>> lack of FAQ about that (my bad). Don't you agree that using dar_split
>> instead of multi-sliced backup only brings advantages? If you explain me
>> some drawback of using dar_split in that context, I will reconsider my
>> point on that :)
>>
>> to my point of view in your case, if you don't want to remake the whole
>> backup, convert it to a single sliced backup with dar_xform and output
>> it to tapes using dar_split
>>
>> assuming your have backup.1.dar backup.2.dar and backup.3.dar available
>> this is done this way:
>>
>> dar_xform backup - | dar_split split_output /dev/nst0
>>
>>>
>>> All of your answers is about dar_split but you say it must be used to
>>> create the (special) slices. What should I do if  did not do it just
>>> used the dar_xform of simply dar with -s and -S option of create ?
>>
>> not sure to understand your question
>>
>>> How to extract catalog from such tapes ?
>>
>> we have answered this question above
>>
>>> How to use sequential mode.
>>
>> you can use --sequential-read even with multi-sliced backup. But as
>> already said, you need to mimic filename using symlinks for dar finds
>> the slices it expectes.
>>
>>>
>>> Sorry for my ignorance, but I really have spent  a lot of time reading
>>> various docs on dar web and mail list but still have very weak
>>> understanding what to do ...
>>
>> not worries, documentation is always perfectible, you input are valuable
>> (see this new FAQ that was missing, I guess if you had it available
>> since the beginning, things would have been simpler to you, right?)
>>
>>>
>>> So far it looks that I cannot do anything with multiple slices on tape
>>> except of feeding the drive with all tapes in sequential mode .
>>
>> No you can, just playing with symlink to the tape device as seen previously.
>>
>>> I am not even verify it the particular slice is correct (as the -l  does
>>> not work on other then first slice)
>>
>> --sequential-mode is also available with sliced backup, thus suitable to
>> tape as you did, at the cost of symlink manipulation to emulate filenames.
>>
>>> Extracting the catalogue from the
>>> last one is also problematic
>>> is the only answer - use the dar_split (so I have to start with backups
>>> from the scratch ?  I have already written ten or more tapes with slices
>>> from multiple backups (filesystems)
>>>
>>>
>>>
>>> Best regards, Petr Skoda
>>>
>>>
>>> ---------- Původní e-mail ----------
>>> Od: Denis Corbin <dar...@fr...>
>>> Komu: dar...@li...
>>> Datum: 13. 5. 2022 18:38:45
>>> Předmět: Re: [Dar-support] Multiple slices on LTO6 tape
>>>
>>>
>>>
>>>>
>>>> I have saved each slice to one tape using  dd if=datas.1.dar
>>>> of=/dev/nst0 bs=256k etc...
>>>> I still have the 3 slices on my staging disk (taking almost 7TB) .
>>> But I
>>>> am not sure how to list the content , check the consistency and
>>> finally
>>>> extract without needing such large space in the future.
>>>
>>> in the mode you have been using, if you want to list the backup
>>> content,
>>> dar will only need the last slice. This will lead you to read the whole
>>> 3rd slice from tape.
>>>
>>>
>>> how the command should look like (with dd and pipe - I need dd because
>>> of larger block) ?
>>>
>>>
>>>
>>>
>>> Better you can use instead an isolated catalogue of
>>> the backup on tape (something you can do on-fly or afterward). An
>>> isolated catalogue is usually small an does not require to be backed up
>>> to tape as you can recreate one from a backup at any time.
>>>
>>> example to create a isolated catalogue:
>>> dar -C isolated -A backup -z ...
>>>
>>>
>>> Yes but then I need still to have all slices on the disk (called
>>> backup.1.dar , backup.2.dar etc.... - 10TB or more)
>>>
>>>
>>>
>>>
>>>
>>>
>>> If you want to extract a single file from backup, dar will ask the last
>>> slice, then the slice where the file/tape to restore resided. But if
>>> you
>>> restore with the help of an isolated catalogue, dar will only need the
>>> slice where is located the file to restore.
>>>
>>>
>>> How will it get the right info if reading from tape with the other than
>>> first slice ?- the extract will not accept the data from tape - (except
>>> the first)?
>>>
>>> I.e. when using dd if=/dev/nst0 s=256k|dar -0 -x -g something  it worked
>>> on tape with backup.1.dar
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> example to restore with the help of a catalogue
>>> dar -x backup -A isolated ...
>>>
>>> it is written many times in docs - but here I need the name of backup
>>> (so it must be on the  disk and in fact all slices )
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> I would like
>>>> simply to restore only some files in the future getting instruction
>>>> which tape to use and than extract  Ideally using some pipes combined
>>>> with dd if=/dev/nst0 bs=256k |  dar something -x -
>>>> However dar requires all slices together - I am not able to use only
>>>> e.g. datas.2.dar to list only its contents.
>>>
>>> You could better use dar+dar_split instead, if this suits your need.
>>> The
>>> advantage is that it will not need you any extra storage to restore or
>>> list the backup, but will act a bit like tar, reading the tapes from
>>> the
>>> first toward the last up to the point the file's metadata and data to
>>> restore are reached.
>>>
>>>
>>> ok - so dar_split is something different which creates some header and
>>> catalogues for each slices/tape ?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Note that you cannot use dar_split at reading time if you have not used
>>> it at creation time. when using dar+dar_split, dar creates a single
>>> slice, slice that dar_split (as you can guess from its name) splits
>>> over
>>> different tapes. Thus, at reading time, dar expects a single slice and
>>> not the concatenation of two or more slices, which is what dar_splits
>>> does at reading time (concatenating the content of several tapes).
>>>
>>>
>>> so can I read from tape with dar_split (using -0 as well )or not)
>>>
>>>
>>>
>>>
>>> However you can convert a splited backup to a single sliced backup
>>> using
>>> dar_xform and then use dar_split:
>>>
>>> dar_xform backup - | dar_split split_output /dev/tape
>>>
>>> note that if you had an isolated catalogue from 'backup' it stays a
>>> valid isolated catalog for the single sliced backup generated by
>>> dar_xform, that dar_split has written to several tapes.
>>>
>>>
>>> Would you mind to write a concrete examples of using /dev/nst0 or
>>> /dev/tape)
>>>
>>> instead of name of the backup ?
>>>
>>> Combination with pipe on dd ?
>>>
>>>
>>>
>>>
>>>>
>>>> When starting with dar I expected that in sequential mode with some
>>>> flags (even the -al does not work here) I will be able to scan
>>> what is
>>>> on given slice and get something from it.
>>>
>>> you can get file content per slide using the -Tslice option while
>>> listing an multi-sliced backup (or an isolated catalogue from this
>>> backup).
>>>
>>>
>>> In fact it did not help in reading the single slice. It only tells me
>>> (while having all slices on disk ) at which slices the subdire pointed
>>> by -g is written
>>>
>>>
>>>
>>>
>>> But you lose this ability when using dar with dar_split as from
>>> dar stand point, there is only one slice.
>>>
>>>> But now having more tapes I am even not able to check what is on the
>>>> tape (if it is slice 2 I am not able simply to use
>>>> dd if=/dev/nst0 bs=256k |dar -0 -l  -   to see just for
>>> orientation part
>>>> of the listing of content.
>>>
>>> there is not really a table of content per slice, a sliced backup is
>>> still a coherent backup with an table of content at the end of the
>>> slice
>>> set.
>>>
>>>
>>> I understand - but it seems also not to have header recognized by dar
>>> (except of the first)
>>>
>>>
>>>
>>>
>>> Dar is not expecting to have all slices available at any time, you can
>>> use -p option to pause after each (n) created slice(s) and do what is
>>> needed, like move the produced slices to tape then remove it, before
>>> having dar continuing its work. At reading time, dar will ask for the
>>> missing slice and pause, you can then obtain it from tape and let dar
>>>
>>>
>>> Ok - but it means I need to feed all slices (tape by tape) until finding
>>> the right one.
>>>
>>> Even if I know that my subdir is on a slice number n>1
>>>
>>>
>>>
>>>
>>>
>>>
>>> with dar_split you do not need mbuffer if you just want to rate limit
>>> the throughput (see its -r option)
>>>
>>>
>>> yes but mbuffer is printing the progress the data rates etc ... And the
>>> buffer filling is IMHO important for preventing the tape shoe shining .
>>>
>>> I do not want o make slower the feeding the tape what probably does the
>>> -r ) - so in writting it is usefull the if the  tape drive does not cope
>>> with a speed of feeding data. But on LTO on SAS controller the speed of
>>> writting is fast and the tape waits for feeding and e.g. compressing)
>>> the data so the buffer (large several GB) is needed not the let the tape
>>> to stop.
>>>
>>> In fact  in my setup the hard disk is slower than tape speed (as it is
>>> not on dedicated SAS)
>>>
>>> But I will try to make some tests to prove this my subjective feeling.
>>>
>>>
>>>
>>>
>>>
> 

Re: [Dar-support] Multiple slices on LTO6 tape

For full, incremental, compressed and encrypted backups or archives

Re: [Dar-support] Multiple slices on LTO6 tape