From: Mantis B. T. <no...@bu...> - 2010-02-23 02:36:49
|
A NOTE has been added to this issue. ====================================================================== http://bugs.bacula.org/view.php?id=1511 ====================================================================== Reported By: craigmiskell Assigned To: ebollengier ====================================================================== Project: bacula Issue ID: 1511 Category: Director Reproducibility: always Severity: crash Priority: normal Status: acknowledged ====================================================================== Date Submitted: 2010-02-22 00:38 UTC Last Modified: 2010-02-23 02:27 UTC ====================================================================== Summary: Crash on running .bvfs_update Description: The director crashes every time I run .bvfs_update on a single large job. This job has 6M+ files in it, and I think 2M+ directories. It does manage to add a good number of records to pathhierarchy every time (usually 50K, sometimes more), before crashing, but only completes when run on small jobs. ====================================================================== ---------------------------------------------------------------------- (0005132) craigmiskell (reporter) - 2010-02-22 00:40 http://bugs.bacula.org/view.php?id=1511#c5132 ---------------------------------------------------------------------- Happy to run patched versions etc, but other than assuming it's a problem with the hp pointer being not-null, but still incorrect, I'm at a loss to come up with a solution without spending an inordinate amount of time figuring out the hash table implementation. ---------------------------------------------------------------------- (0005133) craigmiskell (reporter) - 2010-02-22 01:50 http://bugs.bacula.org/view.php?id=1511#c5133 ---------------------------------------------------------------------- Hmmm, having just run a bunch of the updates in a row (just trying to get the pathhierarchy table populated, the number of rows it manages each time is very consistently similar. Over the last five runs, each run added: 49991 49987 49989 49989 49984 Am I just hitting some internal limit on the size of the hash tables? ---------------------------------------------------------------------- (0005134) ebollengier (administrator) - 2010-02-22 08:37 http://bugs.bacula.org/view.php?id=1511#c5134 ---------------------------------------------------------------------- I see that you are using Debian Etch, on this version, unfortunatly we couldn't trust the backtrace output... Can you try to reproduce the problem a debian 5.0 for example ? You can also send me an extraction of you database (mainly the Path table, and the File table (only on this jobid). You should be able to use COPY to do that : http://www.postgresql.org/docs/current/interactive/sql-copy.html Thanks ---------------------------------------------------------------------- (0005135) craigmiskell (reporter) - 2010-02-22 19:33 http://bugs.bacula.org/view.php?id=1511#c5135 ---------------------------------------------------------------------- Sorry, I don't have any Lenny installs available for testing. Would Ubuntu 8.10 be acceptable? Couple of further comments: 1) The debug output (run at debug_level 1000 :)) does correspond with the backtrace, showing the crash is at least highly likely to be within that method. 2) Could you give me a brief idea of what's wrong with Etch that means you can't trust the backtrace. We have some local oddities which might mitigate the issue. 3) Even compressed, the partial file table is 166MB, and the full path table is 61MB. Would you like them attached to this bug report, or is there a better place to send them (FTP site?). The directory names are not overly sensitive information, but I would still prefer a somewhat private medium. ---------------------------------------------------------------------- (0005136) ebollengier (administrator) - 2010-02-22 20:02 http://bugs.bacula.org/view.php?id=1511#c5136 ---------------------------------------------------------------------- Etch uses an old glibc that have a nasty bug that corrupts the call stack, Ubuntu 8.10 doesn't have this problem. http://sources.redhat.com/ml/libc-hacker/2006-09/msg00003.html Perhaps the backtrace is valid (looks like), but perhaps not, and i can't spend time to diagnose this kind of problem on etch or redhat 4 for example. In other word, we don't support old redhat or old debian systems. Anyway, this afternoon, I was doing performance tests for a customer, and I have reproduced your problem with random data. It looks like to come from the htable code, but it's a bit strange because this code is well tested, I will look ASAP. ---------------------------------------------------------------------- (0005137) craigmiskell (reporter) - 2010-02-22 22:38 http://bugs.bacula.org/view.php?id=1511#c5137 ---------------------------------------------------------------------- Great, thanks. Let me know if you need any testing done or further info to help reproduce. I can get a patched build up and running fairly quickly. ---------------------------------------------------------------------- (0005138) craigmiskell (reporter) - 2010-02-23 02:27 http://bugs.bacula.org/view.php?id=1511#c5138 ---------------------------------------------------------------------- An additional data point: I have been able to run the .bvfs_update on a fairly large job from a different client (although not quite as large as the problem job). The only thing that springs to mind about the problem client/job is that that server uses a metric butt-load of hardlinks for an internal application backup process. I wouldn't expect hardlinked files to affect the path hierarchy calculations, so I just mention it in case it connects at all with anything else you see. Issue History Date Modified Username Field Change ====================================================================== 2010-02-22 00:38 craigmiskell New Issue 2010-02-22 00:38 craigmiskell File Added: bvfc-crash-debug.traceback 2010-02-22 00:39 craigmiskell File Added: bvfs-crash-debug.output 2010-02-22 00:40 craigmiskell Note Added: 0005132 2010-02-22 01:50 craigmiskell Note Added: 0005133 2010-02-22 08:37 ebollengier Note Added: 0005134 2010-02-22 08:37 ebollengier Assigned To => ebollengier 2010-02-22 08:37 ebollengier Status new => feedback 2010-02-22 19:33 craigmiskell Note Added: 0005135 2010-02-22 20:02 ebollengier Note Added: 0005136 2010-02-22 20:02 ebollengier Status feedback => acknowledged 2010-02-22 22:38 craigmiskell Note Added: 0005137 2010-02-23 02:27 craigmiskell Note Added: 0005138 ====================================================================== |