Re: [Dar-support] Incremental backup of large amount of files is very slow
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Denis C. <dar...@fr...> - 2022-02-25 17:54:59
|
Le 24/02/2022 à 21:38, J. Roeleveld via Dar-support a écrit : > Hi all, Hi Joost, > > Is there a way to speed up incremental backups of directories with large > amounts of files? first thing first: could you check that you have the "Large dir. speed optimi." set to "YES" in the compile-time features shown issuing 'dar -V' ? > > Backing up my mail storage (1 file per email) takes a very long time, I'm not > sure what it's doing, but as it takes a long time before it actually starts > writing the backupfile, I feel it's checking the content of every file against > the catalogue. only metadata data is used to decide whether to backup or not a given file. Then, the file content is read only if decision to back it up has been taken. When using large directories and this is the reason why the large directory speed optimization has been added some years ago, in addition to reading only once each directory content and metadata associated to each file storing it into memory, as usually done, this data is also indexed from a sorted list for fast lookup (lookup used when comparing each file metadata to its previous status). Searching a sorted list has logarithm complexity so it is quite efficient (this search algorithm implementation relies on the standard C++ library). For the rest, the process has the same cost per file, whatever is the number of files per directory. Point to consider are thus: - does your system struggle for disk I/O? - does your system struggle for CPU load? - does your system struggle for memory (and started swapping)? - if backing up over the network, does a network congestion occurring? Depending on that you report we can investigate in the appropriated direction. > > If this is the case, is there a way to have dar only compare the name (if it > exists) and filesize? this is already the case, the metadata is gather in a single system call (from the stat() familly) that returnes the whole file's metata structure at once: the file type, filesize if appropriated, dates (mtime, actime, ctime, birthime if available), permissions, ... Dar uses most of this to decide whether a file has change or not (it does not use the atime for example). Anyway, I cannot see how to do with less that than and this faster. > I am willing to risk minor losses for most of the > incrementals. I would be doing regular "masters" where this option would not > be used. some features that have some impact on performance : - compression: algorithm, compression level, block/stream mode, number of threads (see -z and -G options) - ciphering: algorithm, number of threads used (see -K/-J and -G options) - if disk I/O is not the problem, you can disable the lookup for sparse files (which, depending on data under backup, can however save a lot of storage space, in particular at restoration time, something compression alone cannot do). see --sparse-file-min-size option - you could disable tape marks (see -at option) to reduce the CPU usage (if this was the point of contention) at the cost of not being able to repair a truncated backup, loosing a redundancy level of information about the backup content (the catalogue) and the inability to read the backup sequentially (only direct access would be available, which is fast, but may not fit all need like storing backup on tapes). > > Many thanks, > > Joost > > > Cheers, Denis |