Re: [Dar-support] dar without recursion
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Per J. <per...@gm...> - 2022-05-22 05:23:37
|
Hi Denis,
Thank you for the very good answer on file/directory filtering. I will
save it for future reference :-)
Best regards
Per
Den 21.05.2022 kl. 21.27 skrev Denis Corbin:
> Le 21/05/2022 à 20:24, Per Jensen a écrit :
>> Hi,
> Hi Per, Hi Jean-Baptiste,
>
>> would something like: dar -c file-backup -R directory -I "*" -P "*"
>> work ?
> it will "work" (dar would not complain) but it will not solve the problem.
>
>> where
>>
>> -I "*" selects all files
>>
>> -P "*" ignores all directories
> -P and -g option are applied to all entries: directories *and* file,
> unlike -I and -X that only apply to files. Both are evaluated independently:
> - a directory has only to pass the filtering of -P/-g/-[/-] options
> applied to its whole path for its fate to be known.
> - a file has *in addition* to satisfy the -X/-I filters applied solely
> on its filename.
>
> Here in your proposal, you will get an empty backup, as -P "*" will
> filter out everything, files and directories. -I option will not be
> applied to plain files as they would already be filtered out.
>
>> Regards Per
>>
>>
>> Den 21.05.2022 kl. 15.18 skrev Jean-Baptiste Denis:
>>> Hello,
>>>
>>> I've got a directory with an awful lot of files beneath it, at a
>>> single level. There is also a number of directories
>>> that I don't know in advance.
>>>
>>> directory/
>>> ├── dir00
>>> ├── dir01
>>> ├── dir02
>>> ├── dir03
>>> ├── dir04
>>> ├── dir05
>>> ├── dir06
>>> ├── dir07
>>> ├── dir08
>>> ├── dir09
>>> ├── dir10
>>> ├── file0000000
>>> ├── file0000001
>>> ├── file0000002
>>> ├── file0000003
>>> ├── file0000004
>>> [...]
>>> ├── file0999997
>>> ├── file9999998
>>> ├── file0999999
>>> └── file1000000
>>>
>>> I'd like to use dar on "directory" without considering all its
>>> subdirectories. From the documentation, it is not clear
>>> *to me* if I can get dar doing this without doing some prework before:
>>> first one to spot the directories using find or
>>> equivalent, and second one using dar excluding them ?
> 1/ this is not perfect, but you can use this first approach (I assumed
> "directory/" was located at the root of the filesystem):
>
> dar -c backup -R / -g directory -P "directory/*/*" ... other options...
>
> this will still save dir* files but not their content.
>
> 2/ else, if your filesystem supports EA, I mean has "user_xattr" added
> to its mount options in /etc/fstab, but you can also set it up live using:
> mount -o remount,user_xattr /
>
> then, a simple thing would be to add an Extended Attribute to all
> directories of /directory and use the --exclude-by-ea option:
>
> find /directory -type d -exec setfattr -n user.libdar_no_backup {} \;
> dar -c backup -R / -g directory --exclude-by-ea ... other options...
>
> 3/ you can also do the same without EA by using the dump flag (if
> supported on the filesystem):
>
> find /directory -type d -exec chattr +d {} \;
> dar -c backup -R / -g directory --nodump ... other options...
>
> 4/ last, if you do not want or cannot touch the filesystem under backup,
> you have to list the directories to be excluded and provide the list to dar:
>
> find /directory -type d > /tmp/dirlist.txt
> dar -c backup -R / -g directory --exclude-from-file /tmp/dirlist.txt
>
> I have no better option so far.
>
>>> I'm hijacking my own thread with some side questions related somehow
>>> to my initial question:
>>>
>>> 1. I'd like to have slices size up to a certain size OR containing at
>>> most N files, whatever comes first. I don't know
>>> if this would fit with dar internals. If that's the case, it could be
>>> a nice new option.
> it does not fit dar internals, you can just define the size of a slice a
> one byte accuracy, then dar will not make larger slices but will fill
> them up but with the data to backup. I don't see the use case of your
> requirement, can you develop?
>
>>> 2. Is there a way to handle each slice independently (by running
>>> multiple dar commands for example)
>>> when extracting, or
>>> iit doesn't make sense ? I could imagine dar slices stored on an nfs
>>> server and multiple clients using dar (from an
>>> external script) on differents slice in parallel, potentially
>>> leveraging a datacenter/hpc network and a
>>> parallel/distributed filesystem.
> parallelism works better with independent data sets. I would thus
> suggest making many independent single sliced backup with dar and read
> them as many concurrent dar commands. Though having a single dar backup
> does not prevent many different dar command to read it and extract data
> from it at the same time. Dar does not use any temporary file and does
> not touch backups to restore data from them.
>
> But note also that for extracting data from a dar backup, there are FUSE
> and AVFS clients [1]. This would leverage kernel VFS caching. And if you
> "mount" a dar backup over NFS, the caching will be local to each NFS
> client host, which may avoid network data transfer for data often
> requested on a particular host.
>
> [1] http://dar.linux.free.fr/doc/presentation.html#external_tools
> (well I don't know the status of these project, contact their authors
> directly if you need to).
>
>>> Thank you !
>>>
>>> Jean-Baptiste
>>>
> Cheers,
> Denis
>
|