Re: [Dar-support] dar without recursion
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Denis C. <dar...@fr...> - 2022-05-23 12:09:31
|
Le 22/05/2022 à 22:12, Jean-Baptiste Denis a écrit : > Hi Per, Hi Denis, > > Per, thank you for the suggestion. I tried something approaching without luck before asking the list ;) > > Denis, thank you very much for the detailed explanation. Your first suggestion looks perfect to me. It's elegant and > almost obvious after reading it ;) Do you consider this solution not perfect because of the (empty) directories creation > or something else ? I can live with that. yes, exactly, you end with empty directories. But if you can live with that, this is less imperfect... ;) [...] > >> I don't see the use case of your requirement, can you develop? > > Here is the simplified story: > > - I've got two nas (nas0 et nas1) I access using nfs3. > - I need to backup nas0 on nas1 (300 TBytes, 100+ millions of files), rebuild nas0 and restore its content from nas1 > - nas0 is actively used until the downtime > - I need to minimize the downtime > > My actual strategy is to run rsync mutiples times until the scheduled downtime. I'm running multiple rsync in parallel > from multiple servers with the help of fpart/fpsync (https://www.fpart.org/). fpsync creates set of files (with a > maximum total size or a maximum file number, whatever comes first). I do a last synchronization pass (using fpsync) > during the downtime to catch up with the last modifications. I do what I have to do, and I restore using the same tools. > > It works reasonably well, I've done multiple migrations using this strategy. But I've got a lot of small files > (<1KBytes), sometimes millions files at the same level in a single directory, or nicely distributed in sub-directories. > I've also big files (sometimes weird characters in filenames ;)) Any tools using "posix" access will struggle in this > situation. Splitting by number of files helps in such case by preventing a single rsync process handling 1 million 1K > files (for a total of 1 GB). > > With that said, using this method, I'm paying the small files overhead reading and writing thoses files during the > backup from source to destination, and also during the restoration. > > dar looks like a tool I could use to cut this overhead in half: keeping the overhead during the read phase of the > backup, and the write phase during the restore. All other steps would deal with dar slices (writing "big" slices during > backup and reading them during restore). Dar allows me to keep the iterative backup pass before the scheduled downtime. > > I'm just exploring ideas at the moment, and I do not have a definitive idea on how to make it works. Hence my questions > about recursion, limiting files number in slices and parallelism. I seems your use of dar is more copy oriented (I mean backup used and destroyed right after creation) than backup (long time storage under dar format). if so and as you target dar for small files, you will gain some CPU cycle without big penalty disabling the sparse file feature (use "--sparse-file-min-size 0" for that) You can also make the backup-restoration without intermediate storage requirement using pipes through ssh [1] or netcat [2] (for less CPU requirement). [1] http://dar.linux.free.fr/doc/usage_notes.html#ssh [2] http://dar.linux.free.fr/doc/usage_notes.html#netcat Doing that way, you should be able to leverage the whole network bandwidth or disk I/O whichever is the most limiting. Dar will still compute CRC and tape mark (in order to be able to read dar backup from a pipe). If you want to iterate the process incrementally, I suggest making on-fly isolated catalogues. So you can incrementally backup and restore over the network using locally stored extracted catalogues: STEP 1: on nas0: dar -c - -R / -g directory -P "directory/*/*" --on-fly-isolate cat_full ... | ssh or netcat on nas1: dar -x - --sequential-read -R /some/where STEP 2: a bit later on nas0: dar -c - -A cat_full -R / -g directory -P "directory/*/*" --on-fly-isolate cat_diff1 ... | ssh or netcat and at the same time on nas1: dar -x - --sequential-read -R /some/where -w [or -r] and so on. > > The devil is in the details of course but your suggestions of a single slice backup; maybe per directory, without > recursion, looks like a good starting point to play with. I suspect the bottleneck will be the network bandwidth (unless you have 10 Gbit/s or more), in that case having concurrent dar/backup process on different data sets will not bring much improvement than a single backup made without intermediate storage as described above. my 2 cents. > > Thank you again for your initial answer ! > You're welcome, > Jean-Baptiste > Denis [...] |