From: Kern S. <ke...@si...> - 2004-04-30 21:44:58
|
Hello, On Fri, 2004-04-30 at 21:50, Mike Acar wrote: > Kern Sibbald <ke...@si...> wrote: > > Hello Mike, > > > > Can you tell me what OS you are using? > > SuSE 9.0: > > mike@hexagram:~> uname -a > Linux hexagram 2.4.21-202-athlon #1 Fri Apr 2 21:22:14 UTC 2004 i686 athlon i386 GNU/Linux > mike@hexagram:~> rpm -q glibc > glibc-2.3.2-88 > > All of the available patches are installed. > > > I ask because I've just written 3G from /usr to tape using two > > simultaneous jobs. I'm sure the blocks are interleaved. Do select * > > from JobMedia where JobId=nn on your two Jobs and if the start/end > > blocks overlap on the jobs, the blocks are interleaved. > > mysql> select StartBlock, EndBlock from JobMedia where JobID in (77,78); > +------------+----------+ > | StartBlock | EndBlock | > +------------+----------+ > | 1 | 3110 | > | 1 | 3810 | > +------------+----------+ > > > I get absolutely no errors in a subsequent Verify job. > > Hmm. So there's a difference somewhere :-/ I wonder if this error would > pop up with a file backup if my test filesets were larger. I'll have to > try on Monday... > > My director and SD are both 1.34.2, though the clients are 1.34.0 and > 1.34.1. I wouldn't think this is relevant... I'd be very surprised. > Were your tests with your current development version? Yes, but the only changes from 1.34.2 are in FileSet handling. > > > Bacula assumes that write() is atomic. If that is not the case, there > > will be problems. > > > > If you are running on FreeBSD, we may either have another pthreads bug > > or some subtle incompatibility. > > Linux all the way, baby. On the director, client, and SD, at least, and > I would think that only the SD is really relevant here. > > For what it's worth, I grepped all of the occurrences of 'checksum=' out > of the storage daemon's debug output, and they're all unique. When I run > scanblocks on that volume, I get: > > btape: btape Error: block.c:313 Volume data error! Block checksum mismatch in block 181: calc=f664c45b blk=99466bb6 > Error reading block. ERR=block.c:313 Volume data error! Block checksum mismatch in block 181: calc=f664c45b blk=99466bb6 > > When I check the daemon's debug output, I find: > > hexagram-sd: block.c:655 write_block: wrote block 301 bytes=64512 > hexagram-sd: record.c:202 write_record_to_block() FI=1 SessId=2 Strm=SPARSE-DATA len=32768 > rem=64488 remainder=32552 > hexagram-sd: append.c:201 write_record FI=1 SessId=2 Strm=SPARSE-DATA len=32768 > hexagram-sd: record.c:202 write_record_to_block() FI=1 SessId=1 Strm=SPARSE-DATA len=32768 > rem=33572 remainder=0 > hexagram-sd: append.c:201 write_record FI=1 SessId=1 Strm=SPARSE-DATA len=32768 > hexagram-sd: record.c:202 write_record_to_block() FI=1 SessId=2 Strm=SPARSE-DATA len=32768 > rem=31924 remainder=0 > hexagram-sd: append.c:184 !write_record_to_block data_len=32768 rem=856 > hexagram-sd: block.c:186 ser_block_header: block_len=64512 > hexagram-sd: block.c:200 ser_bloc_header: checksum=99466bb6 > hexagram-sd: block.c:655 write_block: wrote block 302 bytes=64512 > > So there's a block with that checksum written to tape, but apparently at > an entirely different place (though I'm assuming btape and the sd will > both number blocks the same way). No, they are unlikely to number the blocks the same. I've had problems with block numbering. > > The checksum btape calculated doesn't exist in the debug output at all. I am beginning to suspect that you have hardware problems. I'll do a btape scan on my 6GB tape and see what I get. Best regards, Kern |