Re: [svtoolkit-help] Genome STRiP - mismatched read pair insert size error
Status: Beta
Brought to you by:
bhandsaker
From: Bob H. <han...@br...> - 2013-10-09 17:41:06
|
Hi, Yes, this problem comes up occasionally - I don't know whether it is indel realignment or some other tool (but a corner case in indel realignment is a good guess). I find that I see it a few times in any largish data set that comes from the bwa/GATK pipeline. You can work around this problem by changing a configuration parameter select.validateReadPairs:false. You can do this by changing the config file, or in newer versions of Genome STRiP the preferred way is to leave the standard config file unmodified and put -P select.validateReadPairs:false on the command line. This flag controls several consistency checks. I can't recommend turning this off by default, since if you have bad input data you really do want to find out about it. Best, -Bob On 10/9/13 12:30 PM, Anne-Katrin Emde wrote: > Hello! > > I am running GenomeSTRiP on 50 genomes (bwa-aligned, GATK indel > realigned and quality recalibrated) and I am getting the following error: > > ##### ERROR MESSAGE: Mismatched read pair insert sizes for sample > 6837: [ {HS2000-910_287:1:1309:11124:98404 97 17 22252630 > 0 100M = 22251548 -983 > CTTTGAAGATTTCGTTGGAAACGGGATAATCTTCACAGAAAAGCTAAACAGAAGCATTCTCAGAAACTTCTTTGTGATGTTTGCTTTCAACTCACAGAGT > >@??>>?=>=??=6=?==???<6==>>>?>=??=>;><=>??==?>??<>=>9==>=?=><=>>>?;>><>@?>=?>=<>>?><=?><=5;?=><>>>>= > X0:i:5 X1:i:0 > XA:Z:17,+22255009,100M,0;17,+22259766,100M,0;17,+22247875,100M,0;17,+22245496,100M,0; > BD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > MD:Z:100 RG:Z:6837 XG:i:0 > BI:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > AM:i:0 NM:i:0 SM:i:0 XM:i:0 XO:i:0 MQ:i:47 XT:A:R}, > {HS2000-910_287:1:1309:11124:98404 145 17 22251548 > 47 2M1D98M = 22252630 981 > TTTGAGAGAGAAGCTTTGAAACACTCTTTTTCTAGAATCTGCAAGTGGACATTGGGAGGGCTGTGAGGTTTGTGGTGGAAAAGGAAATATCTCCACATAA > @@>=?=?=?=>?<>=>?>?==>;=><>>>>><=><?=><=<=?>;======>>==<>====>;>=?=;??>;>=;===????==??=>=>=>>=<===@7 > X0:i:1 X1:i:0 OC:Z:100M > BD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > RG:Z:6837 XG:i:0 > BI:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > AM:i:0 NM:i:4 SM:i:37 XM:i:4 XO:i:0 OP:i:22251549 XT:A:U} ] > ##### ERROR > ------------------------------------------------------------------------------------------ > > So the insert size of the left and right mate don't agree. We think > that indel realignment might cause the mismatch in insert sizes. Has > this error message been reported before on bwa-aligned reads (+ GATK > realignment, recalibration)? Is there a way to prevent GenomeSTRiP > from crashing at those instances, e.g. printing a warning instead of > an error? It only happens very rarely - on 50 high-coverage whole > genomes the error was reported 4 times. > > Thanks, > > Anne-Katrin > > > -------------------------------------- > Anne-Katrin Emde, Ph.D. > Bioinformatics Scientist > New York Genome Center > ak...@ny... <mailto:ak...@ny...> > (917)-951-0167 > -------------------------------------- > > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk > > > _______________________________________________ > svtoolkit-help mailing list > svt...@li... > https://lists.sourceforge.net/lists/listinfo/svtoolkit-help |