From: Dan G. <da...@st...> - 2015-05-18 14:14:46
|
Hi Tom, I’m afraid I don’t see anything that points to another reference sequence in the original bam file. This BAM file downloaded from EBI has SQ tags but with only the SN and LN fields and no AS, M5 or UR fields for identifying it further. Other header tags don’t help either. However, is this generally known to be a problem - i.e. does the CRAM reference sequence need to match the reference used for alignment? Surely a robust reference-based compression such as CRAM should not care and would instead merely store additional differences to the provided compression reference sequence instead? - Dan On 18 May 2015, at 2:51 pm, Thomas W. Blackwell <tb...@um...> wrote: > Dan - > > Is it possible that you are compressing GRCh37 alignments against the GRCh38 genome reference sequence ? Look for "hg19" in the original .bam header block. > > I've noticed that not all of the previous samtools warnings and error messages are provided yet by the new htslib versions of samtools. > > - tom blackwell - > > On Sun, 17 May 2015, Dan Greenfield wrote: > >> Dear all, >> >> I encountered a problem converting a BAM file to CRAM and back again. This problem is exhibited on both samtools 1.1 and the latest download for samtools 1.2, each time reproduced from the first step. >> >> Steps to reproduce this: >> 1. download original BAM from EBI: ftp://ftp.sra.ebi.ac.uk/vol1/ERA172/ERA172924/bam/NA12878_S1.bam >> 2. convert to CRAM: >> samtools view -C -o NA12878_S1.cram -T hg38.fa NA12878_S1.bam >> 3. attempt to convert back to BAM: >> samtools view -b -o NA12878_S1.cram.bam -T hg38.fa NA12878_S1.cram >> >> Results from step 3: (no errors/warnings encountered in previous steps) >> >> Slice ends beyond reference end. >> Slice ends beyond reference end. >> ERROR: md5sum reference mismatch for ref 1 pos 248900092..248956422 >> CRAM: 1ca5fd5ffe82936260309c85fc9b473b >> Ref : b47b43c987dbf1af96ca6d59061401c8 >> Failure to decode slice >> [E::hts_close] Failed to decode sequence. >> samtools: error closing "NA12878_S1.cram": -1 >> >> Here are the file sizes (note the cram file is actually bigger than the original bam): >> NA12878_S1.bam 121,691,186,161 >> NA12878_S1.cram 125,386,599,695 >> NA12878_S1.cram.bam 11,109,478,067 >> >> Regards, >> Dan >> >> |