The du -hs /backup/pool /backup/cpool /backup/pc/* has finished. Basically I had 1 host that was taking up 6.9 TB of data with 2.8 TB in the cpool directory and most of the other hosts averaging a GB each.

The 1 host was our file server (which I happen to know has a 2 TB volume (1.3 TB currently used) that is our main fileshare. 

I looked through the error log for this pc on backups with the most errors and found thousands of these: 

Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
Unable to read 8388608 bytes from /var/lib/BackupPC//pc/myfileserver/new//ffileshare/RStmp got=0, seekPosn=1501757440 (0,512,147872,1499463680,2422719488)
I didn't see any of the ""BackupPC_link got error -4" errors. So now I'm running this command:

du -hs /backup/pool /backup/cpool /backup/pc/myfileserver/* 

to see which backups are doing the most damage. I'll report back once that finishes.

Thanks for all your help!


Regards,
Craig


On Wed, Oct 30, 2013 at 10:24 PM, Holger Parplies <wbppc@parplies.de> wrote:
Hi,

Adam Goryachev wrote on 2013-10-31 09:04:48 +1100 [Re: [BackupPC-users] Disk space used far higher than reported pool size]:
> On 31/10/13 07:51, Holger Parplies wrote:
> > [...]
> > Aside from that, I would think it might be worth the effort of determining
> > whether all hosts are affected or not (though I can't really see why there
> > should be a difference between hosts). If some aren't, you could at least
> > keep their history.
> I suspect at least some hosts OR some backups are correct, or else OP
> wouldn't have anything in the pool.

as I understand it, the backups from before the change from smb to rsyncd are
linked into the pool. Since the change, some or all are not. Whether the
change of XferMethod has anything to do with the problem or whether it
coincidentally happened at about the same point in time remains to be seen.
I still suspect the link to $topDir as cause, and BackupPC_link is independent
of the XferMethod used (so a change in XferMethod shouldn't have any influence).

> [...] you might want to look at one individual host like this:
> du -sm /backup/pool /backup/cpool /backup/pc/host1/*
>
> This should be a *lot* quicker than the previous du command, and also
> should show minimal disk usage for each backup for host1. It is quicker
> because you are only looking at the set of files for the pool, plus one
> host.

Just keep in mind that *incrementals* might be small even if not linked to
pool files.

Oh, and there is still another method that is *orders of magnitude* faster:
look into the log file(s), or even at the *size* of the log files. If it
happens every day, for each host, it shouldn't be hard to find. You can even
write a Perl one-liner to show you which hosts it happens for (give me a
sample log line and I will).

If the log files show nothing, we're back to finding the problem, but I doubt
that. You can't "break pooling" by copying, as was suggested. Yes, you get
independent copies of files, and they might stay independent, but changed
files should get pooled again, and your file system usage wouldn't continue
growing in such a way as it seems to be. If pooling is currently "broken",
there's a reason for that, and there should be log messages indicating
problems.

> PS, at this stage, you may want to look at the recent thread regarding
> disk caches, and caching directory entries instead of file contents. It
> might help with all the directory based searches you are doing to find
> the problem. Long term you may (or not) want to keep the settings.

Yes, but remember that for a similarly sized pool it used up about 32 GB of
96 GB available memory. If you can do your investigation on a reasonably idle
system (i.e. not running backups, without long pauses), you should get all the
benefits of caching your amount of memory allows without any tuning. And even
tuning won't let you hold 32 GB of file system metadata in 4 GB of memory :-).
It all depends on file count and hardware memory configuration.

Regards,
Holger

------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/