Re: [Burp-users] Millions upon millions of tiny files
Brought to you by:
grke
|
From: Myles Loosley-M. <my...@pr...> - 2021-07-09 06:24:11
|
Hi Gabriel, Much appreciated on the additional hints, I'll definitely pour over 'em, and see what impact they have! That said, I think my original message was a little unclear. BURP is performing admirably in terms of read/write/IO/working out delta's, etc., the tweaks to BURP are great to have, but the real issue isn't IO performance with BURP, it's the design issue of backups having a ~1:1 ratio of files on host vs files in the backups. In our case, I'm nuking the rsync'd copy of our BURP data from the new filesystem, and letting BURP start a new datastore from scratch. Because rsync kept crapping out, and took literally a day for each attempted pass of the existing datastore, we have no realistic way of knowing what was missed/etc. (so it's safer to start from scratch, and age out the old copy once the new one is mature enough). I was waiting on the following operation to complete before replying: ----- # time rm -rf burp real 1221m49.019s user 6m22.922s sys 247m57.203s ----- That's pretty terrible right? Until you consider that's somewhere to the tune of ~30 million files at that point (so ~450 deletes/sec). I'm not sure if that's fast or slow (for a non-SSD filesystem), as I've never timed deletes into the millions, but I can tell you it's definitely a helluva low slower than any operation dealing with larger/sequential files :). I was ultimately just looking for an out, a method to ensure BURP didn't have such an insane file-count to work around, space is so much cheaper than IOPS after all. It's looking like that just isn't a thing, at least not at this point in time =\. Regards, Myles Loosley-Millman On 2021-07-07 9:31 a.m., gabrielknight2 via Burp-users wrote: > Hi all, > > I hope it will help a little, > > We are also using Burp for backup Windows XP-7-10, MacOS (not anymore) and GNU/Linux desktops > > all in protocol 1 and with > working_dir_recovery_method = resume > > > on first older server, > Dell PowerEdge R510, Intel(R) Xeon(R) CPU X5650 @ 2.67GHz (6 cores/12 threads), 8Gb RAM > for 65 clients, 16 Tb of data on the burp server > > > I have performance problems too with phase4 and a lot of small files on a XFS filesystem (on 12x Nearline SAS 7200 rpm 2Tb hard drives with Dell PERC H700 1Gb cache RAID6) > I don't know if the problem is more XFS or low-sized cache on the PERC RAID card > I tried to play with burp-server.conf parameters, perhaps it will help in your case I don't know : > > I tried as recommanded in the burp documentation : > I also followed advices on https://github.com/grke/burp/wiki/Performance-Tips > > hardlinked_archive = 1 > librsync = 0 > max_children = 4 (and even less, 3 or 2 to avoid I/O concurrency) > > I also tried : > manual_delete=/raid/trashcan/manual-delete > and deleting this directory once by night (it seems it was one of my problem loosing a lot of time with erasing tons of small files with XFS) > > I also tried : > compression=zlib1 > ssl_compression=zlib1 > to speed-up the compression part of each file and transfer > > I tried excluding already compressed data for not loosing time with it (but not too much since if I understood it checks the whole extension list for each file, so setting a lot of exclude could slow down) > exclude_comp = jpg > exclude_comp = gif > exclude_comp = png > exclude_comp = mp3 > exclude_comp = mov > exclude_comp = mkv > exclude_comp = mp4 > exclude_comp = zip > exclude_comp = 7z > exclude_comp = gz > exclude_comp = cab > > and at last, i choose to keep only necessary backups to speed-up the process : > keep = 3 > keep = 2 > > at the end, I had to space the time between two backups so that each machine succeeds in saving itself, without causing imbalance (when the server is saturated, some machines often manage to backup, when others never had an available slot anymore) > timer_arg = 40h > > > > > > > on second server, It seems to work a lot faster, > Dell PowerEdge R540, Intel(R) Xeon(R) Gold 5122 CPU @ 3.60GHz (4 cores / 8threads), 32Gb RAM > > I keep the default configuration, and I don't have problem with too long phase4 that blocks next backups, for 65 clients, 10 Tb of client data on the burp server > this time I choose EXT4 filesystem, but it is now on a PERC RAID6 H740P with 8 Gb cache, I guess it makes a difference too (with 14x Nearline SAS 7200 rpm 4Tb hard drives) > so probably I guess that to manage a lot of very small files, cache is useful, and ext4 is better suitable for that too > I can't have a good conclusion, since the whole server is newer and better, and I could not try other combinations (what would it be with XFS or ZFS on the same server ?) > > > > > > anyway, for the ZFS question, I never tried it myself for Burp Backup, but perhaps with activating ZFS cache system with write-intensive SSD and a lot of RAM for ZFS, it could probably do a better work (probably what we don't buy in Hardware RAID card we must pay in write-intensive SSD and RAM) > https://linuxhint.com/configuring-zfs-cache/ > > I don't know if processor and huge amount of ram even really count when using "classic" filesystems like ext4 and xfs ? > > probably, as hnsz2002 said, RAID10 would be even better too > > > I'm looking for feedback too to share best practices about infrastructure needed using Burp Backup, so if we can talk too of solutions that works well for you ? :) > > > Best regards > > > > > > > > > > Sent with ProtonMail Secure Email. > > > _______________________________________________ > Burp-users mailing list > Bur...@li... > https://lists.sourceforge.net/lists/listinfo/burp-users |