Re: [Bio-bwa-help] Reg. "Hard clipping operator not at start or end of CIGAR" error for reads align
Status: Beta
Brought to you by:
lh3lh3
From: Alec W. <al...@br...> - 2014-01-23 18:07:07
|
I realized I may have forgotten to close the loop with the bwa list. The bug in MergeBamAlignment has been fixed as of release 1.106. Alignments with hard-clipped should now be handled properly. -Alec On Nov 5, 2013, at 12:10 PM, jco...@tg... wrote: > Thanks Alec! Is there a workaround in the meantime? Will samtools merge do the trick? > > Thanks again, > > Jason > > __________ > Sent from my pocket supercomputer using the tiniest of keyboards > > > > -------- Original message -------- > From: Alec Wysoker <al...@br...> > Date: 11/05/2013 9:35 AM (GMT-07:00) > To: Shanker Swaminathan <SSw...@tg...> > Cc: samtools help <sam...@li...>,bio...@li... > Subject: Re: [Bio-bwa-help] Reg. "Hard clipping operator not at start or end of CIGAR" error for reads aligned using bwa mem > > > Hi Shanker, > > This is a bug in MergeBamAlignment that I just learned about yesterday. bwa mem can hard-clip reads even if the -H flag is used, and MergeBamAlignment is not handling them properly. I will let you know when we fix this. > > -Alec > > On Oct 17, 2013, at 8:51 PM, <SSw...@tg...> wrote: > >> Hello >> >> I have aligned a canine exome interleaved fastq (fastq generated from the corresponding BAM using htslib) using bwa Version: 0.7.5a-r405 with the following command: >> bwa mem -C -M -p -t 16 Canis_familiaris.CanFam3.1.73.dna.toplevel.fa SAMPLE.fq.gz > SAMPLE.sam >> >> I then added the RG info for the aligned using RG tag info from the BAM file using Picard's MergeBamAlignment tool. >> >> On running Picard's ValidateSamFile tool on this aligned file with RG info, I get the following summary: >> ## HISTOGRAM java.lang.String >> Error Type Count >> ERROR:INVALID_CIGAR 497409 >> ERROR:MATES_ARE_SAME_END >> 41994 >> ERROR:MISMATCH_FLAG_MATE_NEG_STRAND >> 7994 >> ERROR:MISMATCH_FLAG_MATE_UNMAPPED >> 2348 >> >> It appears that the "ERROR:INVALID_CIGAR" for all the reads with the error is like this: >> ERROR: Record 1300, Read name HWI-ST388-W7D:5:1107:5369:113470#ACAGTG, Hard clipping operator not at start or end of CIGAR >> ERROR: Record 1998, Read name HWI-ST388-W7D:5:1303:19317:94590#ACAGTG, Hard clipping operator not at start or end of CIGAR >> ERROR: Record 2123, Read name HWI-ST388-W7D:5:1302:15107:54745#ACAGTG, Hard clipping operator not at start or end of CIGAR >> ERROR: Record 2305, Read name HWI-ST388-W7D:5:2303:4070:95826#ACAGTG, Hard clipping operator not at start or end of CIGAR >> ERROR: Record 2306, Read name HWI-ST388-W7D:5:1108:1388:115094#ACAGTG, Hard clipping operator not at start or end of CIGAR >> and so on… >> >> I extracted the first 5 reads having this error: >> 1300:HWI-ST388-W7D:5:1107:5369:113470#ACAGTG >> 393 1 >> 5304 0 >> 37M53H53S = >> 5304 0 >> ACTTCTGAGAGCTGAGCTCACCCTCAGTCCCTCACAGGGAAGCATGCCTTGAAGCCAGCTGTCAACTAGCTGCATTCTTATGGCAGGGGG >> HHJJJJJEIIJIBGJJGIJJIJJIJJIFHJJJIJJJJJJIIJIFIGGJJIJJJIJJJHHHHHFFFFFFEECEEDDEDEECDCDDDDDD## >> SA:Z:JH374003.1,292,+,30S60M,0,0; >> RG:Z:AF23_aln_09-04-12 >> NM:i:0 UQ:i:0 >> AS:i:37 XS:i:37 >> 1998:HWI-ST388-W7D:5:1303:19317:94590#ACAGTG >> 393 1 >> 7239 0 >> 33M57H57S = >> 7239 0 >> AGCAGACATCTGCCTCCCATGAATACACACTTCATGGAGCAGTTCTATGCCTTTAGAAGCGAAGGGTACCTGCAGGCACCATGTGGAGGT >> FHIJIJIJJJJJJJJJJJJJJJJJJJJJIIIJJIGJJJIFHIGIJJJJJJJIJJJIJJJHHHFFDD;ACDCDDDDDDDDDDDDEDDCDD# >> SA:Z:JH373713.1,6749,+,29S61M,0,0; >> RG:Z:AF23_aln_09-04-12 >> NM:i:0 UQ:i:0 >> AS:i:33 XS:i:33 >> 2123:HWI-ST388-W7D:5:1302:15107:54745#ACAGTG >> 393 1 >> 11619 0 >> 40M50H50S = >> 11619 0 >> TTATTTATGATAGTCACCGAGAGAGAGAGAGAGACAGAGAAAGGCAGAGACACAAGCAGAGGAAAAAGCAGGCTTTCTGTGGGGAGCCCG >> HGJJJJIJJJJIGHIIJJJJIIJIIIJGIIGHEIIIIHHHHHFFFFDDDEDDDDDDDDDDDD@BDDDDDCDDBDDDDCDCCDDDDDBDBD >> SA:Z:5,71343042,+,24S66M,38,0; >> RG:Z:AF23_aln_09-04-12 >> NM:i:1 UQ:i:40 >> AS:i:35 XS:i:35 >> 2305:HWI-ST388-W7D:5:2303:4070:95826#ACAGTG >> 393 1 >> 13436 0 >> 34M56H56S = >> 13436 0 >> AACAGAACTTGAGGAATTGCTCCAGCCCTTCTGGGAGGTGACTGAGGGTCAGCCTGGGTGGGAGGGGACAGGGGGTCAGCCCAGGGGGGG >> FHGIJGJHGIEGGGIJJJJ<HGGIGEIJJIEI@GH?GH?BFEGIEGEE(@;CCHIGHE.=?A?@BBD8;<@B?@@008<3<02?B&5>BD >> SA:Z:AAEX03023161.1,5386,+,30S60M,0,1; >> RG:Z:AF23_aln_09-04-12 >> NM:i:0 UQ:i:0 >> AS:i:34 XS:i:34 >> 2306:HWI-ST388-W7D:5:1108:1388:115094#ACAGTG >> 393 1 >> 13457 0 >> 33M57H57S = >> 13457 0 >> CCAGCCCTTCTGGAGGTACCTGAGACCAGCCCTGTGTTGGGCTCCATGCACCATGGAGCCAACTTAAAAAACAAAACAAAAACACTACTT >> HHIHEHHHIEHIGBFHCGGHGIGHJJJJGGGGIIJIIJJJG;C=F@CG7=EGHEEHAHEBF9AAA>>@;A:;B5??BD<88<&2<ACCC@ >> SA:Z:2,8824647,+,27S63M,60,0; >> RG:Z:AF23_aln_09-04-12 >> NM:i:0 UQ:i:0 >> AS:i:33 XS:i:33 >> >> I would be grateful if you could explain what does hard clipping in the middle of sequence mean and why there could be many reads with this problem. Is there a problem with the alignment or do I have to change any options? I am unable to use the file downstream in GATK because of these errors. How would you suggest I resolve this problem? >> >> Thank you very much for your help. I look forward to your early reply. >> >> Yours sincerely >> Shanker Swaminathan >> ------------------------------------------------------------------------------ >> November Webinars for C, C++, Fortran Developers >> Accelerate application performance with scalable programming models. Explore >> techniques for threading, error checking, porting, and tuning. Get the most >> from the latest Intel processors and coprocessors. See abstracts and register >> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk_______________________________________________ >> Bio-bwa-help mailing list >> Bio...@li... >> https://lists.sourceforge.net/lists/listinfo/bio-bwa-help > |