|
From: Alec W. <al...@br...> - 2011-11-15 14:59:55
|
Hi Shaun, Without turning the validation stringency down to lenient, Picard would not allow an unmapped read to have the reverse strand flag. For the most part, Picard tools are designed to work with input that passes strict validation, and we don't put a lot of effort into handling cases that fail strict validation. SamToFastq assumes that a read with the reverse strand flag set has been reverse-complemented, which I believe conforms to the spec. By default, SamToFastq will undo the reverse complementing in order to give you back the closest approximation to the original reads that it can. You can pass RE_REVERSE=false in order to suppress this behavior, but it will be suppressed regardless of whether the read is mapped or not. We run into many issues with aligners putting strange stuff in output SAM files. Our approach is to take only the alignment info from the aligner, and get everything else from the original data. MergeBamAlignment does just that, and many problems can be avoided by post-processing BWA output with this program. -Alec On 11/15/11 6:25 AM, SHAUN WEBB wrote: > Hi, I have a question about the picard tool SamToFastq. I have a bam > file containing paired end reads and I have attempted to convert this > back to fastq using the following code: > > java -Xmx10g -jar SamToFastq.jar VALIDATION_STRINGENCY=LENIENT > INPUT=$i FASTQ=$i.fq SECOND_END_FASTQ=$i.2.fq > > The reads have been mapped with bwa and I have used lenient validation > as Picard was complaining about some unmapped reads having a mapping > quality>0. I'm not sure why bwa has given these reads a mapping > quality, I suspect it is to do with the other pair but there are only > ~200 out of ~200,000,000 so not that bothered about them. > > The main question I have is regarding reverse complement of sequence > when converting to FastQ. I have the following paired end reads: > > HS18_6811:3:1304:5092:136893 1145 chr1 3009956 25 75M > = 3009956 0 > CTCTTCCGATCTTTTTCCATTTTCCATTTTCTTTGATTGTTGTGCCGATGTTCTCTATGGAATCTTCTGCACCTG > > ?ED@IIICKMBIK@KGGCJLBGHILKIMFIJGH<IJJKHJFIHLHLKIJHKOIAGFHJKL@DABJDJJIIIIFBC > X0:i:1 X1:i:0 MD:Z:0G1A2A0G68 RG:Z:1 XG:i:0 AM:i:0 NM:i:4 > SM:i:25 XM:i:4 XO:i:0 XT:A:U > > > HS18_6811:3:1304:5092:136893 181 chr1 3009956 0 * > = 3009956 0 > CTCTTCCGATCTCAGGTGCAGAAGATTCCATAGAGAACATCGGCACAACAATCAAAGAAAATGGAAAATGGAAAA > > CFD@DIDKNHJ@ILDEJHEEFEFHGJFDICCGL=DG5CI:FIHJGCJCILIE@LKKEFICGHGIEHIDFFF?*59 > RG:Z:1 > > > The first is mapped to the reverse strand and flagged correctly. The > second is unmapped but bwa gives it the reverse strand flag as this is > where it's pair is mapped. In the output FastQ files both sequences > are reverse complemented. I am unsure whether the unmapped read > sequence in the bam file is the original read sequence or is rc'd when > the bam file is written? Does the picard tool just check for a reverse > flag or does it look to see if the read is mapped also? > > Thanks for any help > > Shaun Webb > |