From: MtK - S. <mt...@sm...> - 2013-08-30 10:53:41
|
Hey, after a few years using backuppc on a local storage (internal HDDs) I've moved the pool into an external location connected through NFS. Since moving the entire pool/cpool/pc was about to take a few days (!) I decided to try and use the new location from scratch (without moving the backups history). but now, backups tend to not. finish. without an error, just taking too long. I have 10 PCs, all of them linux. 6 out of them have a large amount of data, which I would understand the long backups time. but from the first backups I get this: . Full backup took ~570 minutes on Aug. 25th . Incr. backup too ~625 minutes on Aug. 26th next incremental backup never finished up until Aug 29th where I decided to stop it. I rebooted the backuppc machine, and started another manual incremental backups at 16:00. still running (meaning, 21 hours and counting). There are three options, I can think of, where the problem could be: 1. NFS/Pool side. 2. the PC being backed up. 3. the BackupPC machine. none of those seem to have any high CPU/RAM usage or even IO, so I really don't know where the problem actually is. Could anyone help please? MtK |
From: Michael S. <ms...@ch...> - 2013-08-30 13:59:31
|
> Hey, > > after a few years using backuppc on a local storage (internal HDDs) I've > moved the pool into an external location connected through NFS. I'm sure this has been mentioned before, but connecting the pool in this way isn't such a great idea, due to potential slowness and reliability issues. > Since moving the entire pool/cpool/pc was about to take a few days (!) I > decided to try and use the new location from scratch (without moving the > backups history). > > but now, backups tend to not. finish. > without an error, just taking too long. > > I have 10 PCs, all of them linux. > > 6 out of them have a large amount of data, which I would understand the > long > backups time. but from the first backups I get this: > > . Full backup took ~570 minutes on Aug. 25th > > . Incr. backup too ~625 minutes on Aug. 26th You don't say how much data this is, what transport you're using, or what kind of connection there is between the systems. This *could* be impressively fast for what you're working with. For now, all I can say is, don't expect miracles out of nfs. > next incremental backup never finished up until Aug 29th where I decided > to stop it. > > I rebooted the backuppc machine, and started another manual incremental > backups at 16:00. still running (meaning, 21 hours and counting). > > There are three options, I can think of, where the problem could be: > > 1. NFS/Pool side. > > 2. the PC being backed up. > > 3. the BackupPC machine. Or the transport, or some options you've chosen along the way. With what you've provided, #1 seems really likely, though. > none of those seem to have any high CPU/RAM usage or even IO, so I really > don't know where the problem actually is. > > Could anyone help please? > > MtK |
From: MtK - S. <mt...@sm...> - 2013-08-30 14:10:28
|
> You don't say how much data this is, what transport you're using, or what kind of connection there is between the systems. This *could* be impressively fast for what you're working with. For now, all I can say > is, don't expect miracles out of nfs. Pool is 270.29GB comprising 2451613 files and 4369 directories (as of 8/30 08:23), (the entire pool) Those PCs are inside an internal network connected with a 1Gbps Switch. Backup Methos is rsync. |
From: Michael S. <ms...@ch...> - 2013-08-30 14:40:30
|
>> You don't say how much data this is, what transport you're using, or >> what > kind of connection there is between the systems. This *could* be > impressively fast for what you're working with. For now, all I can say > > is, don't expect miracles out of nfs. > Pool is 270.29GB comprising 2451613 files and 4369 directories (as of 8/30 > 08:23), > (the entire pool) > > Those PCs are inside an internal network connected with a 1Gbps Switch. > Backup Methos is rsync. Neither of those things raise red flags, but NFS certainly does. Unless your PCs are particularly slow or your drives are particularly slow, I think you can pretty much settle on "hilariously slow pool storage" as the first place to look. |
From: MtK - S. <mt...@sm...> - 2013-08-30 14:48:19
|
I had the exact same behavior on the local 2-drive-mirror, with the different that I could do only a single backup at a time, since the IO would make the system unresponsive (now I can easily do 3-4 at the same time). Also, just to point out: df/ls/du/find/etc on the external machine (where the NFS is) take hours (!), so maybe it's not the NFS itself but the directory/file structure. |
From: Les M. <les...@gm...> - 2013-08-30 15:41:35
|
On Fri, Aug 30, 2013 at 9:47 AM, MtK - SmartMtK <mt...@sm...> wrote: > I had the exact same behavior on the local 2-drive-mirror, with the > different that I could do only a single backup at a time, since the IO would > make the system unresponsive (now I can easily do 3-4 at the same time). > > Also, just to point out: > df/ls/du/find/etc on the external machine (where the NFS is) take hours (!), > so maybe it's not the NFS itself but the directory/file structure. > Note that until you have backed up a file with the --checksum-seed=32761 option set, the server must read/uncompress everything for the rsync comparison. Also, if you have large files with changes, the server does a lot of filesystem work to copy the existing file and merge the changes, even if the changes are small. -- Les Mikesell les...@gm... |
From: MtK - S. <mt...@sm...> - 2013-08-30 15:46:20
|
I have the same behavior with and without --checksum-seed=32761 (which I added recently) and no most of the files are small, the large files are either the same of completely new. |
From: Michael S. <ms...@ch...> - 2013-08-30 15:55:43
|
> I had the exact same behavior on the local 2-drive-mirror, with the > different that I could do only a single backup at a time, since the IO > would make the system unresponsive (now I can easily do 3-4 at the same > time). Having the system do less work would naturally make it more responsive, so this is logical. This doesn't mean it's faster, obviously -- and note that while 3-4 at the same time may not be noticeably bogging down the local system, since the I/O grunt work is handled by the NFS server, it's possible that it's entirely saturated. > Also, just to point out: > df/ls/du/find/etc on the external machine (where the NFS is) take hours > (!), > so maybe it's not the NFS itself but the directory/file structure. This seems particularly important! |
From: MtK - S. <mt...@sm...> - 2013-08-30 16:09:24
|
>> Also, just to point out: >> df/ls/du/find/etc on the external machine (where the NFS is) take >> hours (!), so maybe it's not the NFS itself but the directory/file >> structure. >This seems particularly important! again, same behavior as when the pool was local, and that's why I couldn't copy it to the new location. The new pool is on a ZFS filesystem, and df does work, so: dh -h = 310G df -i Inodes IUsed IFree IUse% 8989178480 5777551 8983400929 1% what can I do with it, or check on it, to pin-point the issue? ---------------------------------------------------------------------------- -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ BackupPC-users mailing list Bac...@li... List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/ |
From: Les M. <les...@gm...> - 2013-08-30 17:12:47
|
On Fri, Aug 30, 2013 at 11:08 AM, MtK - SmartMtK <mt...@sm...> wrote: > >>> Also, just to point out: >>> df/ls/du/find/etc on the external machine (where the NFS is) take >>> hours (!), so maybe it's not the NFS itself but the directory/file >>> structure. > >>This seems particularly important! > again, same behavior as when the pool was local, and that's why I couldn't > copy it to the new location. > > The new pool is on a ZFS filesystem, and df does work, so: > > dh -h = 310G > > df -i > Inodes IUsed IFree IUse% > 8989178480 5777551 8983400929 1% > > > > > what can I do with it, or check on it, to pin-point the issue? Would it be possible to run backuppc on the same host with the filesystem and avoid nfs completely? -- Les Mikesell les...@gm... |
From: MtK - S. <mt...@sm...> - 2013-08-30 17:28:12
|
I guess it's possible I just rather not do this until: 1. I see that the pool works OK (so that du/find/etc work faster than now) 2. I completely pin-point NFS as the bottleneck of this. |
From: Les M. <les...@gm...> - 2013-08-30 17:39:01
|
On Fri, Aug 30, 2013 at 12:27 PM, MtK - SmartMtK <mt...@sm...> wrote: > I guess it's possible I just rather not do this until: > 1. I see that the pool works OK (so that du/find/etc work faster than now) > 2. I completely pin-point NFS as the bottleneck of this. Do you still have your old filesystem so you can compare the times to walk with something like 'find . -ctime -1 >/dev/null' (make it read the inode contents as well as the directories)? Are you using any of the exotic ZFS options (compression, dedup, maybe even snapshots)? I don't have any experience with ZFS but I think it needs a substantial amount of RAM. -- Les Mikesell les...@gm... |
From: MtK - S. <mt...@sm...> - 2013-08-30 17:47:02
|
> Do you still have your old filesystem so you can compare the times to walk with something like 'find . -ctime -1 >/dev/null' (make it read the inode contents as well as the directories)? the old filesystem is still in /var/lib/backuppc.orig, so yes, but again, it'll take a few days (!) > Are you using any of the exotic ZFS options (compression, dedup, maybe even snapshots)? just compression > I don't have any experience with ZFS but I think it needs a substantial amount of RAM. I have 16GB, it's enough for this task. Just to be clear again, both ZFS and the ext3 I had locally before behave the same way. |
From: Les M. <les...@gm...> - 2013-08-30 18:09:34
|
On Fri, Aug 30, 2013 at 12:46 PM, MtK - SmartMtK <mt...@sm...> wrote: > > > Just to be clear again, both ZFS and the ext3 I had locally before behave > the same way. It's a lot of files that aren't organized in any particular way. So assume the head is going to move for most directory/inode/data accesses and multiply by the time the disk takes to seek. -- Les Mikesell les...@gm... |
From: MtK - S. <mt...@sm...> - 2013-08-30 19:09:14
|
> It's a lot of files that aren't organized in any particular way. So > assume the head is going to move for most directory/inode/data > accesses and multiply by the time the disk takes to seek. How is this different then having local disk? |
From: Les M. <les...@gm...> - 2013-08-30 20:08:54
|
On Fri, Aug 30, 2013 at 1:15 PM, MtK - SmartMtK <mt...@sm...> wrote: > >> It's a lot of files that aren't organized in any particular way. So >> assume the head is going to move for most directory/inode/data >> accesses and multiply by the time the disk takes to seek. > > How is this different then having local disk? I thought you were saying that it wasn't different - that is, also slow locally. The main difference with NFS would be if you have the sync option enabled - and then only for writes. Then, unlike local disks, the writer waits for the system to acknowledge that the block is commuted to disk for each write. The rsize and wsize options might make a bit of difference too. -- Les Mikesell les...@gm... |
From: <bac...@ko...> - 2013-08-30 19:21:33
|
MtK - SmartMtK wrote at about 19:08:38 +0300 on Friday, August 30, 2013: > > >> Also, just to point out: > >> df/ls/du/find/etc on the external machine (where the NFS is) take > >> hours (!), so maybe it's not the NFS itself but the directory/file > >> structure. > > >This seems particularly important! > again, same behavior as when the pool was local, and that's why I couldn't > copy it to the new location. > > The new pool is on a ZFS filesystem, and df does work, so: > > dh -h = 310G > > df -i > Inodes IUsed IFree IUse% > 8989178480 5777551 8983400929 1% > Why are we assuming that taking even hours to do a 'du' or a 'find' on a large filesystem with 10s or 100s of millions of small files scattered all over is necessarily a 'bug'? If you get the same behavior on ZFS and ext3, both locally and with NFS, wouldn't the most logical explanation be that the problem is due to some combination of the large number of scattered files along with perhaps slow-ish disks and hardware? |
From: <bac...@ko...> - 2013-08-30 19:31:06
|
MtK - SmartMtK wrote at about 21:15:01 +0300 on Friday, August 30, 2013: > > It's a lot of files that aren't organized in any particular way. So > > assume the head is going to move for most directory/inode/data > > accesses and multiply by the time the disk takes to seek. > > How is this different then having local disk? I'm confused, before you seemed to imply that it was equally slow locally. (you had written "again, same behavior as when the pool was local, and that's why I couldn't copy it to the new location.") If it's only slow when you are doing du/find over NFS, then it's obviously NFS (btw, not sure why you find 'ls' and 'df' to be slow, they are a very different beast then the recursions required for 'du' and 'find'). If not, then it's the number/scattering of your files (plus/minus slow disks and hardware). If it's NFS, then there are ways to speed it up. But NFS will always be intrinsically slow since it is a file system on top of a file system plus it is (typically) remotely mounted. There are ways to tune NFS to get some better speed (by a severalfold factor in my experience), using options like: async, noatime, nodiratime (with async giving a substantial improvement performance). |