|
From: Peter C. <p.j...@go...> - 2011-11-17 16:46:26
|
On Thu, Nov 17, 2011 at 4:31 PM, Peter Cock <p.j...@go...> wrote: > On Thu, Nov 17, 2011 at 4:14 PM, Peter Cock <p.j...@go...> wrote: >> On Thu, Nov 17, 2011 at 3:16 PM, Sendu Bala <sb...@sa...> wrote: >>> Actually, in this case the warning should be ignored. Using samtools >>> view -u gives uncompressed bam, and these do not have the EOF marker. >> >> Are you sure about that? Which version of samtools were you using? >> >> I've just tried with the current code in Heng Li's github repository, and >> the samtools SVN, and it seems to be producing uncompressed BAM >> with the 28 byte empty BGZF block as an EOF marker. > > Sorry, not looking closely enough at the hexdump. Currently there does > seem to be an empty BGZF block, but because it is using a gzip > compression level of zero it doesn't seem to match the 28 bytes > expected as the EOF marker, rather it looks like different block, > > $ ~/repositories/samtools-git/samtools view -u ex1_header.bam | > hexdump -C | tail > 0006f8b0 31 31 34 5f 32 36 3a 37 3a 33 37 3a 37 39 3a 35 |114_26:7:37:79:5| > 0006f8c0 38 31 00 30 02 00 00 88 88 88 88 88 88 88 88 88 |81.0............| > 0006f8d0 88 88 82 18 42 21 41 11 10 12 0b 0b 0b 1c 1c 1c |....B!A.........| > 0006f8e0 15 1c 1c 1c 1b 1c 1c 1c 1b 1a 1c 1c 1c 1c 1c 0c |................| > 0006f8f0 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 4d 46 43 12 |............MFC.| > 0006f900 41 71 43 1b 4e 4d 43 02 55 51 43 17 48 30 43 00 |AqC.NMC.UQC.H0C.| > 0006f910 48 31 43 01 01 00 00 ff ff c7 03 eb 17 6e f9 00 |H1C..........n..| > 0006f920 00 1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 |.............BC.| > 0006f930 00 1e 00 01 00 00 ff ff 00 00 00 00 00 00 00 00 |................| > 0006f940 > > We're getting this (31 bytes): > > 1f 8b 08 04 00 00 00 00 > 00 ff 06 00 42 43 02 00 > 1e 00 01 00 00 ff ff 00 > 00 00 00 00 00 00 00 > > > We want this (28 bytes): > > 1f 8b 08 04 00 00 00 00 > 00 ff 06 00 42 43 02 00 > 1b 00 03 00 00 00 00 > 00 00 00 00 00 > > Or, "\x1f\x8b\x08\x04\x00\x00\x00\x00\x00\xff\x06\x00BC\x02\x00\x1b\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00" > or, "\037\213\010\4\0\0\0\0\0\377\6\0\102\103\2\0\033\0\3\0\0\0\0\0\0\0\0\0" > > With my patch, > > $ ~/repositories/samtools-git/samtools view -u ex1_header.bam | > hexdump -C | tail > [bgzf_close] compression level: 0 > [bgzf_close] Forcing empty EOF block > 0006f8b0 31 31 34 5f 32 36 3a 37 3a 33 37 3a 37 39 3a 35 |114_26:7:37:79:5| > 0006f8c0 38 31 00 30 02 00 00 88 88 88 88 88 88 88 88 88 |81.0............| > 0006f8d0 88 88 82 18 42 21 41 11 10 12 0b 0b 0b 1c 1c 1c |....B!A.........| > 0006f8e0 15 1c 1c 1c 1b 1c 1c 1c 1b 1a 1c 1c 1c 1c 1c 0c |................| > 0006f8f0 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 1c 4d 46 43 12 |............MFC.| > 0006f900 41 71 43 1b 4e 4d 43 02 55 51 43 17 48 30 43 00 |AqC.NMC.UQC.H0C.| > 0006f910 48 31 43 01 01 00 00 ff ff c7 03 eb 17 6e f9 00 |H1C..........n..| > 0006f920 00 1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 |.............BC.| > 0006f930 00 1b 00 03 00 00 00 00 00 00 00 00 00 |.............| > 0006f93d > > I'll remove the debugging to stderr, and submit a github pull request. > > Peter Patch here: https://github.com/peterjc/samtools/tree/u-eof Pull request here: https://github.com/lh3/samtools/pull/7 Before the patch, $ ~/repositories/samtools-git/samtools view -u ex1_header.bam | samtools sort - test So using a pipe works fine, but using a file: $ ~/repositories/samtools-git/samtools view -u ex1_header.bam > test_old.bam $ samtools sort test_old.bam test [bam_header_read] EOF marker is absent. The input is probably truncated. With the patch, $ ~/repositories/samtools-git/samtools view -u ex1_header.bam > test_new.bam $ samtools sort test_new.bam test (no errors - good) Further testing welcome, for instance is it possible via the samtools command line interface to select other compression levels? They too may generate different empty BGZF blocks, in which case my patch could be modified to always write the 28 bytes explicitly. Regards, Peter |