Re: [Dar-support] dar without recursion
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Denis C. <dar...@fr...> - 2022-05-21 19:27:33
|
Le 21/05/2022 à 20:24, Per Jensen a écrit :
> Hi,
Hi Per, Hi Jean-Baptiste,
>
> would something like: dar -c file-backup -R directory -I "*" -P "*"
> work ?
it will "work" (dar would not complain) but it will not solve the problem.
>
> where
>
> -I "*" selects all files
>
> -P "*" ignores all directories
-P and -g option are applied to all entries: directories *and* file,
unlike -I and -X that only apply to files. Both are evaluated independently:
- a directory has only to pass the filtering of -P/-g/-[/-] options
applied to its whole path for its fate to be known.
- a file has *in addition* to satisfy the -X/-I filters applied solely
on its filename.
Here in your proposal, you will get an empty backup, as -P "*" will
filter out everything, files and directories. -I option will not be
applied to plain files as they would already be filtered out.
>
> Regards Per
>
>
> Den 21.05.2022 kl. 15.18 skrev Jean-Baptiste Denis:
>> Hello,
>>
>> I've got a directory with an awful lot of files beneath it, at a
>> single level. There is also a number of directories
>> that I don't know in advance.
>>
>> directory/
>> ├── dir00
>> ├── dir01
>> ├── dir02
>> ├── dir03
>> ├── dir04
>> ├── dir05
>> ├── dir06
>> ├── dir07
>> ├── dir08
>> ├── dir09
>> ├── dir10
>> ├── file0000000
>> ├── file0000001
>> ├── file0000002
>> ├── file0000003
>> ├── file0000004
>> [...]
>> ├── file0999997
>> ├── file9999998
>> ├── file0999999
>> └── file1000000
>>
>> I'd like to use dar on "directory" without considering all its
>> subdirectories. From the documentation, it is not clear
>> *to me* if I can get dar doing this without doing some prework before:
>> first one to spot the directories using find or
>> equivalent, and second one using dar excluding them ?
1/ this is not perfect, but you can use this first approach (I assumed
"directory/" was located at the root of the filesystem):
dar -c backup -R / -g directory -P "directory/*/*" ... other options...
this will still save dir* files but not their content.
2/ else, if your filesystem supports EA, I mean has "user_xattr" added
to its mount options in /etc/fstab, but you can also set it up live using:
mount -o remount,user_xattr /
then, a simple thing would be to add an Extended Attribute to all
directories of /directory and use the --exclude-by-ea option:
find /directory -type d -exec setfattr -n user.libdar_no_backup {} \;
dar -c backup -R / -g directory --exclude-by-ea ... other options...
3/ you can also do the same without EA by using the dump flag (if
supported on the filesystem):
find /directory -type d -exec chattr +d {} \;
dar -c backup -R / -g directory --nodump ... other options...
4/ last, if you do not want or cannot touch the filesystem under backup,
you have to list the directories to be excluded and provide the list to dar:
find /directory -type d > /tmp/dirlist.txt
dar -c backup -R / -g directory --exclude-from-file /tmp/dirlist.txt
I have no better option so far.
>>
>> I'm hijacking my own thread with some side questions related somehow
>> to my initial question:
>>
>> 1. I'd like to have slices size up to a certain size OR containing at
>> most N files, whatever comes first. I don't know
>> if this would fit with dar internals. If that's the case, it could be
>> a nice new option.
it does not fit dar internals, you can just define the size of a slice a
one byte accuracy, then dar will not make larger slices but will fill
them up but with the data to backup. I don't see the use case of your
requirement, can you develop?
>>
>> 2. Is there a way to handle each slice independently (by running
>> multiple dar commands for example)
>> when extracting, or
>> iit doesn't make sense ? I could imagine dar slices stored on an nfs
>> server and multiple clients using dar (from an
>> external script) on differents slice in parallel, potentially
>> leveraging a datacenter/hpc network and a
>> parallel/distributed filesystem.
parallelism works better with independent data sets. I would thus
suggest making many independent single sliced backup with dar and read
them as many concurrent dar commands. Though having a single dar backup
does not prevent many different dar command to read it and extract data
from it at the same time. Dar does not use any temporary file and does
not touch backups to restore data from them.
But note also that for extracting data from a dar backup, there are FUSE
and AVFS clients [1]. This would leverage kernel VFS caching. And if you
"mount" a dar backup over NFS, the caching will be local to each NFS
client host, which may avoid network data transfer for data often
requested on a particular host.
[1] http://dar.linux.free.fr/doc/presentation.html#external_tools
(well I don't know the status of these project, contact their authors
directly if you need to).
>>
>> Thank you !
>>
>> Jean-Baptiste
>>
>
Cheers,
Denis
|