#1 Sam to Bam conversion failed

open
nobody
None
5
2013-04-19
2013-04-19
Robert Eveleigh
No

I was attempting to find a new alternative aligner for SOLiD based data and ran a test on some of our in-house data. Using version 1.3.2 of subread I was able to successively generate a color-space index for my reference genome and generate a sam file. However, generally practice is to convert the sam file to an indexed bam file to save space.

commands used:
./subread-align -T 12 -Q -i human_hg19 -r 4169_01.pair1.fastq -R 4169_01.pair2.fastq -o 4169_01.sam

This conversion was unsuccessful for a number of reasons:

1. The sam header lacks the RG (read group tag) . See http://samtools.sourceforge.net/SAM1.pdf for details. Most aligners give you the option to add a user defined RG tag.

2. I also ran picard's ValidateSamFile jar (http://picard.sourceforge.net/) on the generated sam file and it identified numerous problems with the construction of the sam file. It identified the problem stated in 1 and a number of other problems. ex.

ERROR: Read groups is empty
ERROR: Record 1, Read name 4169-01:1_39_719, Read length does not match quals length
ERROR: Record 2, Read name 4169-01:1_39_719, MAPQ should be < 256.
ERROR: Record 2, Read name 4169-01:1_39_719, Read length does not match quals length
ERROR: Record 3, Read name 4169-01:1_39_1845, Read length does not match quals length
ERROR: Record 4, Read name 4169-01:1_39_1845, Read length does not match quals length
ERROR: Record 3, Read name 4169-01:1_39_1845, Mate alignment does not match alignment start of mate
ERROR: Record 3, Read name 4169-01:1_39_1845, Mate negative strand flag does not match read negative strand flag of mate
ERROR: Record 3, Read name 4169-01:1_39_1845, Mate reference index (MRNM) does not match reference index of mate

I will test this on Illumina data to see if this is a SOLiD problem or a more general problem.

Discussion