From: Mantis B. T. <no...@bu...> - 2013-05-10 08:54:45
|
The following issue requires your FEEDBACK. ====================================================================== http://bugs.bacula.org/view.php?id=1990 ====================================================================== Reported By: hbrown Assigned To: ====================================================================== Project: bacula Issue ID: 1990 Category: Director Reproducibility: unable to reproduce Severity: minor Priority: normal Status: feedback ====================================================================== Date Submitted: 2013-02-22 20:58 GMT Last Modified: 2013-05-10 09:54 BST ====================================================================== Summary: 5.2.13 crashes when more than one Path Description: I've just upgraded from 5.2.12 to 5.2.13, and have found that the Bacula director crashes if there is more than one Path. I've got a test job that reproduces this crash perfectly in 5.2.13, and with 5.2.12 it just keeps on going without problems. First off, here's the error I see at the time of the crash: 22-Feb 09:47 cbs-01-dir JobId 227346: Warning: sql_create.c:590 More than one Path!: 2 for path: /var/cfengine/.git/objects/37/ 22-Feb 09:47 cbs-01-dir JobId 227346: Error: sql_create.c:596 error fetching row: 22-Feb 09:47 cbs-01-dir: ERROR in sql_create.c:600 Failed ASSERT: ar->PathId I've checked the database, and sure enough there *are* multiple rows returned for this particular path: mysql> SELECT * FROM Path WHERE Path='/var/cfengine/.git/objects/37/'; +---------+--------------------------------+ | PathId | Path | +---------+--------------------------------+ | 5594948 | /var/cfengine/.git/objects/37/ | | 5594949 | /var/cfengine/.git/objects/37/ | +---------+--------------------------------+ 2 rows in set (0.00 sec) However, I have more than a few mentions of the "More than one Path" in logs from previous jobs. Here's a sample of one, using Bacula 5.2.12, from December 2012, of the same fileset on the same host: 29-Dec 02:15 cbs-01-sd JobId 209631: Sending spooled attrs to the Director. Despooling 1,940,283 bytes ... [snip] 29-Dec 02:18 cbs-01-dir JobId 209631: Warning: sql_create.c:1016 More than one Filename! 2 for file: a37007161975c06dd4c0a37b73b07bbdf29bc6 29-Dec 02:18 cbs-01-dir JobId 209631: Warning: sql_create.c:1016 More than one Filename! 2 for file: f54f5cedb52bf7dcf7fe1fe4b2b797274bae67 29-Dec 02:18 cbs-01-dir JobId 209631: Warning: sql_create.c:590 More than one Path!: 2 for path: /var/cfengine/.git/objects/37/ [snip] Build OS: x86_64-unknown-linux-gnu redhat JobId: 209631 Job: noc-var.2012-12-29_02.05.02_35 Backup Level: Differential, since=2012-12-01 02:09:22 Client: "noc-fd" 3.0.1 (30Apr09) x86_64-redhat-linux-gnu,redhat, FileSet: "var" 2009-07-09 12:17:34 Pool: "Daily" (From Run pool override) Catalog: "MyCatalog" (From Client resource) Storage: "tape" (From Job resource) Scheduled time: 29-Dec-2012 02:05:02 Start time: 29-Dec-2012 02:06:03 End time: 29-Dec-2012 02:19:06 Elapsed time: 13 mins 3 secs Priority: 10 FD Files Written: 6,248 SD Files Written: 6,248 FD Bytes Written: 2,145,282,121 (2.145 GB) SD Bytes Written: 2,146,271,788 (2.146 GB) Rate: 2739.8 KB/s Software Compression: None VSS: no Encryption: no Accurate: no Volume name(s): 000143 Volume Session Id: 3082 Volume Session Time: 1355945436 Last Volume Bytes: 611,754,200,064 (611.7 GB) Non-fatal FD errors: 8 SD Errors: 0 FD termination status: OK SD termination status: OK Termination: Backup OK -- with warnings (There were considerably more warnings about similar paths and filenames; let me know if you'd like the whole thing.) I've managed to get a backtrace (attached); I had to run it manually (run gdb as bacula, apply commands from gdb_backtrace by hand), so let me know if I've left anything out. I've had a quick look at sql_create.c; I can't see any difference between the two files, so I assume the problem is elsewhere. I've switched back to 5.2.12 and have successfully run this job, so it appears that *something* has broken in 5.2.13. The failure appears to be line 595: 593 /* Even if there are multiple paths, take the first one */ 594 if (num_rows >= 1) { 595 if ((row = sql_fetch_row(mdb)) == NULL) { 596 Mmsg1(&mdb->errmsg, _("error fetching row: %s\n"), sql_strerror(mdb)); 597 Jmsg(jcr, M_ERROR, 0, "%s", mdb->errmsg); I don't know why fetching the first row would fail. I can test patches or switch back to 5.2.13 if needed. Thanks very much for your help, and let me know if you need anything else from me. Steps to Reproduce: 1. Have more than one entry for a particular Path in the Path table. 2. Run a job that includes that path in the files that are being backed up. 3. Crash. Additional Information: The "Product Version" field in Mantis did not include 5.2.13. ====================================================================== ---------------------------------------------------------------------- (0006659) ebollengier (administrator) - 2013-04-18 08:54 http://bugs.bacula.org/view.php?id=1990#c6659 ---------------------------------------------------------------------- Sorry you have problems with Bacula, however you are facing to a very strange situation. Bacula ensures that only one Job can update the Path and the Filename at the same time, and always inserts one copy of each filename/path, so for me, this is impossible to get two filenames with the same value, unless you changed something, or you played with the database by hand. I'm dealing with billion of files, TB catalogs, and I never saw this problem before. Something looks wrong with your MySQL catalog (perhaps some MyISAM problem). You must fix your catalog first, it is corrupted. dbcheck might have an option to fix this situation, or you will have to do it yourself. ---------------------------------------------------------------------- (0006702) kern (administrator) - 2013-05-10 09:54 http://bugs.bacula.org/view.php?id=1990#c6702 ---------------------------------------------------------------------- I believe that the failure is due to the ASSERT() at line 600 of sql_create.c I am not 100% sure the ASSERT should be there, but it has been since 2009. The difference in the Bacula versions you are seeing is very likely because in the one failing, you have DEBUG turned on in <bacula>/src/version.h. Turn it off and rebuild and Bacula will not crash. Better yet, I recommend that you fix your database as it is broken. I think dbcheck will clean it up, but I haven't looked at that program for a long time. Also, you if you are using MySQL, you might wish to switch to PostgreSQL, which is much more "ACID" than MySQL, and also performs better for large volumes, though at first, PostgreSQL "seems" hard to understand compared to MySQL. Please confirm my analysis. Issue History Date Modified Username Field Change ====================================================================== 2013-02-22 20:58 hbrown New Issue 2013-02-22 21:00 hbrown File Added: rt_1503.gdb 2013-04-18 08:54 ebollengier Note Added: 0006659 2013-04-18 08:55 ebollengier Severity crash => minor 2013-04-18 08:55 ebollengier Reproducibility always => unable to reproduce 2013-05-10 09:54 kern Note Added: 0006702 2013-05-10 09:54 kern Status new => feedback ====================================================================== |