From: Martin S. <ma...@li...> - 2010-03-31 18:43:52
|
>>>>> On Wed, 31 Mar 2010 11:29:21 +1300, Craig Miskell said: > > Martin Simmons wrote: > >>>>>> On Tue, 30 Mar 2010 12:09:47 +1300, Craig Miskell said: > >> Hi, > >> I have a TL-2000 tape changer with two drives. I have two differential jobs which start at the same time, on different > >> filesystems but the same client (also happens to be the same server as the director/storage daemon). This is bacula 5.0.0 > >> > >> Both jobs interleave to the same tape (this is good IMO, although I don't know which knob I tweaked to make that happen > >> :)), but one of the jobs first seems to claim a second tape, purge/recycle it, log the following rows in jobmedia: > >> jobmediaid | jobid | mediaid | firstindex | lastindex | startfile | endfile | startblock | endblock | volindex > >> ------------+-------+---------+------------+-----------+-----------+---------+------------+----------+---------- > >> 12953 | 38 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 1 > >> 12956 | 38 | 8 | 1 | 37 | 0 | 0 | 1 | 14325 | 2 > >> 12957 | 38 | 8 | 37 | 50 | 1 | 1 | 1 | 15499 | 3 > >> .... > >> > >> where mediaid 7 is the tape it just recycled and didn't use, and mediaid 8 is the tape it's sharing with the other job. > >> > >> I'm happy for it to interleave, but I'm curious as to why it touches that first tape (7) but then decides not to use it. > >> It's effectively recycling/nuking a backup that doesn't need to be, requiring two tapes where one will do. Anyone > >> got any idea what's going on? Happy to file a bug if it is one, but don't want to false report. > > > > I suggest trying it in bacula 5.0.1 first. > Is that because you have some concrete knowledge of a fix that is in 5.0.1, or just a blind "upgrade to the latest" > suggestion? It's a non-trivial amount of work for something that might just be speculation. A bit of both. There was some concurrency issue like this, but I can't remember which version it was. In general, it is best to upgrade before reporting a bug. > > If that still fails, what does it report in the log say when it decides not to > > use the first tape? > Well, still on 5.0.0 for now, in case this sheds any light on the matter, here's the logs (latest at the top) from the > start of the jobs (at 21:00:00 up to where it starts actually reading/writing data. 000006L4 is the volume that gets > recycled but not used, 000013L4 is the one that is used. I can't quite figure out why it doesn't write anything to > 6L4; it's loaded and ready nearly 2 minutes before 13L4 is loaded, so it should have had something written to 5L4 by then. > > 948 | 41 | 2010-03-30 21:06:06 | ausv01-fd JobId 41: /tmp is a different filesystem. Will not descend from / into /tmp > 947 | 41 | 2010-03-30 21:06:06 | ausv01-dir JobId 41: Max Volume jobs exceeded. Marking Volume "000013L4" as Used. > 946 | 41 | 2010-03-30 21:06:06 | ausv01-sd JobId 41: Recycled volume "000013L4" on device "Drive-1" (/dev/nst0), all previous data lost. > 945 | 41 | 2010-03-30 21:06:03 | ausv01-sd JobId 41: 3305 Autochanger "load slot 12, drive 0", status is OK. > 944 | 41 | 2010-03-30 21:05:27 | ausv01-sd JobId 41: 3304 Issuing autochanger "load slot 12, drive 0" command. > 943 | 41 | 2010-03-30 21:04:42 | ausv01-sd JobId 41: 3307 Issuing autochanger "unload slot 5, drive 0" command. > 942 | 41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: Recycled volume "000013L4" > 941 | 41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: All records pruned from Volume "000013L4"; marking it "Purged" > 940 | 41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: There are no more Jobs associated with Volume "000013L4". Marking it purged. > 939 | 40 | 2010-03-30 21:04:09 | ausv01-dir JobId 40: Max Volume jobs exceeded. Marking Volume "000006L4" as Used. > 938 | 40 | 2010-03-30 21:04:09 | ausv01-sd JobId 40: Recycled volume "000006L4" on device "Drive-1" (/dev/nst0), all previous data lost. > 937 | 40 | 2010-03-30 21:04:07 | ausv01-sd JobId 40: 3305 Autochanger "load slot 5, drive 0", status is OK. > 936 | 40 | 2010-03-30 21:03:33 | ausv01-sd JobId 40: 3304 Issuing autochanger "load slot 5, drive 0" command. > 935 | 40 | 2010-03-30 21:01:53 | ausv01-sd JobId 40: 3307 Issuing autochanger "unload slot 1, drive 0" command. > 934 | 41 | 2010-03-30 21:00:14 | ausv01-dir JobId 41: Sending Accurate information. > 933 | 41 | 2010-03-30 21:00:13 | ausv01-dir JobId 41: Using Device "Drive-1" > 932 | 41 | 2010-03-30 21:00:08 | ausv01-dir JobId 41: Start Backup JobId 41, Job=ausv01-other.2010-03-30_21.00.00_29 > 931 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Sending Accurate information. > 930 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Using Device "Drive-1" > 929 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Recycled volume "000006L4" > 928 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: All records pruned from Volume "000006L4"; marking it "Purged" > 927 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: There are no more Jobs associated with Volume "000006L4". Marking it purged. > 926 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Start Backup JobId 40, Job=ausv01-gd01.2010-03-30_21.00.00_28 > > I'm going to try to work around this by setting Maximum Concurrent Jobs = 1 on each tape drive in the changer, and see > what works. If I avoid the interleaving, I should avoid the unnecessary recycling. It's also possible that I'd get > some results by starting one of the jobs 15 minutes later than the other, by which time it'll already have a tape ready > and writing for the first job, so it'll use that one. > > I'd still be keen on some sort of understanding of what's going on in the first place though; my lack of understanding > disturbs me. What "Max Volume jobs" is set for 000006L4? It seems to think this has been exceeded immediately after it was recycled. __Martin |