From: Volker S. <vo...@vo...> - 2005-07-16 10:02:43
|
On Fr, 15 Jul 2005, Volker Sauer wrote: > On Fr, 15 Jul 2005, Arno Lehmann wrote: > > >I'll upgrade to 1.36.3 and see what happens. Maybe "Fix deadlock in > > >multiple simultaneous jobs." (from ReleaseNotes) could be the right one. > > >I already setup this site with 1.36.3 FileFormat because I knew it's > > >going to be required! > > > I had the same problem of a locking DIR, which worked ok after a > > restart, and I could never find a reason (partly because I never > > investigated with gdb, but that's beyond my skills and as long as I > > could restart my backups rather easily that was ok). > > With 1.36.3 this problem vanished. > > Until yesterday. > > Yes, the same with me. I upgraded to 1.36.3 and the problem occured > again, yesterday. > Now I setup "trace on" and "setdebug 100" for dir and sd and I'm waiting > for the problem to occur again! Last night, the director locked up again. (See traces attached). The job "paris-home.archived" was finished. The jobs "paris-home.guest" and "paris-home.staff.1" are stuck in the holding-disk, because the director locked up as the job "bali-rootfs" started - nothing was spooled from bali-rootfs, the director seemed to be stuck immediately. Btw: The director Maximum Concurrent Jobs = 6 and, the client is usually set to Maximum Concurrent Jobs = 1 except the host paris, where it is 2. The storage daemon is set to Maximum Concurrent Jobs = 20. An interesting thing is: again it's the job bali-rootfs the causes the director to lock up. I'll exclude this job for a few days and see if the director still locks up. Plus, I'll set the debuglevel to 200. I've attach backup-dir.conmsg and bacula.trace (level 100). I don't see anything unusual in bacula.trace. Btw: the first part of bacula.trace are the jobs of the night before last night. They finished without problems. The trace of last night seems to start around line 518. At the end of the file in line 722 I tried to connect with bconsole. The connect timed out with no entry in the logfile. I cleared the kernel ringbuffer yesterday so in case any hardware or bus-problems occur, the should be error. There's only: --------------- nfs warning: mount version older than kernel nfs warning: mount version older than kernel APIC error on CPU1: 02(02) APIC error on CPU0: 02(02) nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel ---------------- That's all. I hope you can see something in the logs, that I missed! -- Volker Sauer * Alexanderstrasse 39/217 * 64283 Darmstadt Telefon: 06151-154260 * Mobil: 0179-6901475 * ICQ#98164307 mailto:vo...@vo... * http://www.volker-sauer.de PGPKey-Fingerprint: DB2611C7B12E0B2739992E4F7E354E4D5DD5D0E0 |