Re: [Dar-support] Incremental backup of large amount of files is very slow

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Le 24/02/2022 à 21:38, J. Roeleveld via Dar-support a écrit :
> Hi all,

Hi Joost,

> 
> Is there a way to speed up incremental backups of directories with large 
> amounts of files?

first thing first: could you check that you have the
"Large dir. speed optimi." set to "YES" in the compile-time features
shown issuing 'dar -V' ?

> 
> Backing up my mail storage (1 file per email) takes a very long time, I'm not 
> sure what it's doing, but as it takes a long time before it actually starts 
> writing the backupfile, I feel it's checking the content of every file against 
> the catalogue.

only metadata data is used to decide whether to backup or not a given
file. Then, the file content is read only if decision to back it up has
been taken.

When using large directories and this is the reason why the large
directory speed optimization has been added some years ago, in addition
to reading only once each directory content and metadata associated to
each file storing it into memory, as usually done, this data is also
indexed from a sorted list for fast lookup (lookup used when comparing
each file metadata to its previous status).

Searching a sorted list has logarithm complexity so it is quite
efficient (this search algorithm implementation relies on the standard
C++ library). For the rest, the process has the same cost per file,
whatever is the number of files per directory.

Point to consider are thus:
- does your system struggle for disk I/O?
- does your system struggle for CPU load?
- does your system struggle for memory (and started swapping)?
- if backing up over the network, does a network congestion occurring?

Depending on that you report we can investigate in the appropriated
direction.

> 
> If this is the case, is there a way to have dar only compare the name (if it 
> exists) and filesize? 

this is already the case, the metadata is gather in a single system call
(from the stat() familly) that returnes the whole file's metata
structure at once: the file type, filesize if appropriated, dates
(mtime, actime, ctime, birthime if available), permissions, ... Dar uses
most of this to decide whether a file has change or not (it does not use
the atime for example). Anyway, I cannot see how to do with less that
than and this faster.

> I am willing to risk minor losses for most of the 
> incrementals. I would be doing regular "masters" where this option would not 
> be used.

some features that have some impact on performance :
- compression: algorithm, compression level, block/stream mode, number
of threads (see -z and -G options)
- ciphering: algorithm, number of threads used (see -K/-J and -G options)
- if disk I/O is not the problem, you can disable the lookup for sparse
files (which, depending on data under backup, can however save a lot of
storage space, in particular at restoration time, something compression
alone cannot do). see --sparse-file-min-size option
- you could disable tape marks (see -at option) to reduce the CPU usage
(if this was the point of contention) at the cost of not being able to
repair a truncated backup, loosing a redundancy level of information
about the backup content (the catalogue) and the inability to read the
backup sequentially (only direct access would be available, which is
fast, but may not fit all need like storing backup on tapes).

> 
> Many thanks,
> 
> Joost
> 
> 
> 
Cheers,
Denis

Re: [Dar-support] Incremental backup of large amount of files is very slow

For full, incremental, compressed and encrypted backups or archives

Re: [Dar-support] Incremental backup of large amount of files is very slow