Re: [svtoolkit-help] svtoolkit-help Digest, Vol 23, Issue 1
Status: Beta
Brought to you by:
bhandsaker
From: Bob H. <han...@br...> - 2013-05-29 04:12:39
|
Hi, John, I put in a fix (hopefully) into release 1.04.1162 that we posted this evening. Let me know if that fixes the problem. -Bob On 5/28/13 4:36 PM, John Broxholme wrote: > Sorry, I missed the details. > > Initial command line: >   java -cp > /usr/local/genetics/svtoolkit_1.04.1068/lib/SVToolkit.jar:/usr/local/genetics/svtoolkit_1.04.1068/lib/gatk/GenomeAnalysisTK.jar:/usr/local/genetics/svtoolkit_1.04.1068/lib/gatk/Queue.jar > -Xmx4g \ >   org.broadinstitute.sting.queue.QCommandLine \ >   -S /usr/local/genetics/svtoolkit_1.04.1068/qscript/SVPreprocess.q \ >   -S /usr/local/genetics/svtoolkit_1.04.1068/qscript/SVQScript.q \ >   -gatk > /usr/local/genetics/svtoolkit_1.04.1068/lib/gatk/GenomeAnalysisTK.jar \ >   -cp > /usr/local/genetics/svtoolkit_1.04.1068/lib/SVToolkit.jar:/usr/local/genetics/svtoolkit_1.04.1068/lib/gatk/GenomeAnalysisTK.jar:/usr/local/genetics/svtoolkit_1.04.1068/lib/gatk/Queue.jar > \ >   -configFile conf/hs37d5.conf \ >   -tempDir /well/htseq/ILLUMINA-WGS/SV-Freeze5/OUTPUT3/tmpdir \ >   -R /users/johnb/SV/Genomes/hs37d5.fa \ >   -reduceInsertSizeDistributions \ >   -genomeMaskFile /users/johnb/SV/Genomes/hs37d5.mask.fa \ >   -genderMapFile OUTPUT3/gender.map \ >   -runDirectory OUTPUT3/AS_CLL_156GL \ >   -md OUTPUT3/AS_CLL_156GL/metadata \ >   -computeGCProfiles \ >   -copyNumberMaskFile /users/johnb/SV/Genomes/hs37d5.cn2mask.fa \ >   -jobLogDir OUTPUT3/AS_CLL_156GL/logs \ >   -I AS_CLL_156GL.bam > > Contents of OUTPUT3/AS_CLL_156GL/logs/SVPreprocess-11.out: > > INFO  12:54:38,270 HelpFormatter - > ---------------------------------------------------------- > INFO  12:54:38,272 HelpFormatter - Program Name: > org.broadinstitute.sv.apps.ComputeGCProfiles > INFO  12:54:38,275 HelpFormatter - Program Args: -I > /gpfs1/well/htseq/ILLUMINA-WGS/SV-Freeze5/AS_CLL_156GL.bam -O > /gpfs1/well/htseq/ILLUMINA-WGS/SV-Freeze5/OUTPUT3/AS_CLL_156GL/metadata/gcprofile/AS_CLL_156GL.bam.gcprof.zip > -R /users/johnb/SV/Genomes/hs37d5.fa -md OUTPUT3/AS_CLL_156GL/metadata > -referenceProfile > OUTPUT3/AS_CLL_156GL/metadata/gcprofile/reference.gcprof.zip > -genomeMaskFile /users/johnb/SV/Genomes/hs37d5.mask.fa > -copyNumberMaskFile /users/johnb/SV/Genomes/hs37d5.cn2mask.fa > -configFile conf/hs37d5.conf > INFO  12:54:38,276 HelpFormatter - Date/Time: 2013/05/23 12:54:38 > INFO  12:54:38,276 HelpFormatter - > ---------------------------------------------------------- > INFO  12:54:38,276 HelpFormatter - > ---------------------------------------------------------- > INFO  12:54:38,298 ComputeGCProfiles - Opening reference sequence ... > INFO  12:54:38,299 ComputeGCProfiles - Opened reference sequence. > INFO  12:54:38,299 ComputeGCProfiles - Opening genome mask ... > INFO  12:54:38,299 ComputeGCProfiles - Opened genome mask. > INFO  12:54:38,300 ComputeGCProfiles - Opening copy number mask ... > INFO  12:54:38,300 ComputeGCProfiles - Opened copy number mask. > INFO  12:54:38,300 ComputeGCProfiles - Initializing algorithm ... > #INFO: ReadCountAlgorithm: detected metadata version 1, forcing legacy > behavior > INFO  12:54:38,338 ComputeGCProfiles - Algorithm initialized. > INFO  12:54:38,338 ComputeGCProfiles - Opening reference GC profile ... > INFO  12:54:38,365 ComputeGCProfiles - Opened reference GC profile. > INFO  12:54:38,366 ComputeGCProfiles - Processing input file > org.broadinstitute.sv.dataset.SAMFileLocation@986cff74 ... > Exception in thread "main" java.lang.RuntimeException: Invalid > sequence position: 17:81195230 >     at > org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:46) >     at > org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237) >     at > org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147) >     at > org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:24) >     at > org.broadinstitute.sv.apps.ComputeGCProfiles.main(ComputeGCProfiles.java:120) > Caused by: java.lang.IllegalArgumentException: Invalid sequence > position: 17:81195230 >     at > org.broadinstitute.sv.mask.GenomeMaskFastaFile.getMaskBit(GenomeMaskFastaFile.java:80) >     at > org.broadinstitute.sv.metadata.gc.GCProfileAlgorithm.processRecord(GCProfileAlgorithm.java:107) >     at > org.broadinstitute.sv.metadata.gc.GCProfileAlgorithm.processSAMFile(GCProfileAlgorithm.java:124) >     at > org.broadinstitute.sv.apps.ComputeGCProfiles.run(ComputeGCProfiles.java:181) >     at > org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:38) >     ... 4 more > > > > > On Tue, May 28, 2013 at 9:24 PM, > <svt...@li... > <mailto:svt...@li...>> wrote: > > Send svtoolkit-help mailing list submissions to >     svt...@li... > <mailto:svt...@li...> > > To subscribe or unsubscribe via the World Wide Web, visit >     https://lists.sourceforge.net/lists/listinfo/svtoolkit-help > or, via email, send a message with subject or body 'help' to >     svt...@li... > <mailto:svt...@li...> > > You can reach the person managing the list at >     svt...@li... > <mailto:svt...@li...> > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of svtoolkit-help digest..." > > > Today's Topics: > >   1. "Invalid sequence position" (John Broxholme) >   2. Re: "Invalid sequence position" (Bob Handsaker) >   3. Re: "Invalid sequence position" (John Broxholme) >   4. Re: "Invalid sequence position" (Bob Handsaker) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 23 May 2013 15:21:57 +0100 > From: John Broxholme <jo...@we... > <mailto:jo...@we...>> > Subject: [svtoolkit-help] "Invalid sequence position" > To: svt...@li... > <mailto:svt...@li...> > Cc: Linda Hughes <li...@we... <mailto:li...@we...>> > Message-ID: >     > <CADGtfyC7wyVAqJfRO3SAwEPVuhGQtjrGJMrZxu3B=Xjv...@ma... <mailto:Xjv...@ma...>> > Content-Type: text/plain; charset="utf-8" > > Pre-processing of one(of 270+) deep BAM files has failed with: > > ... > INFO  12:54:38,366 ComputeGCProfiles - Processing input file > org.broadinstitute.sv.dataset.SAMFileLocation@986cff74 ... > Exception in thread "main" java.lang.RuntimeException: Invalid > sequence > position: 17:81195230 > ... > > Where would this have come from?  The pipeline has been the same > to prepare > all 270+ of the (25x deep) BAMs, and this is the only failure.  Any > suggestions on what might be wrong and how to fix it will be most > welcome! > > Thanks > John > > -- > John Broxholme > Wellcome Trust Centre for Human Genetics > Roosevelt Drive, Oxford, OX3 7BN, UK > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Thu, 23 May 2013 10:35:31 -0400 > From: Bob Handsaker <han...@br... > <mailto:han...@br...>> > Subject: Re: [svtoolkit-help] "Invalid sequence position" > To: svt...@li... > <mailto:svt...@li...> > Message-ID: <519...@br... > <mailto:519...@br...>> > Content-Type: text/plain; charset="iso-8859-1" > > Including the stack trace would be most helpful. > If you are using an hg19-based reference, then chr17 is only > 81195210 long. > Does ValidateSamFile like this bam? > This may be related to the bwa idiosyncracy of occasionally leaving > nominally-invalid POS fields in unmapped records. > If so, I will try to fix it if you can send me the command line and > stack trace. > -Bob > > On 5/23/13 10:21 AM, John Broxholme wrote: > > Pre-processing of one(of 270+) deep BAM files has failed with: > > > > ... > > INFO ? 12:54:38,366 ComputeGCProfiles - Processing input file > > org.broadinstitute.sv.dataset.SAMFileLocation@986cff74 ...? > > Exception in thread "main" java.lang.RuntimeException: Invalid > > sequence position: 17:81195230 > > ... > > > > Where would this have come from? ? The pipeline has been the > same? to > > prepare all 270+ of the (25x deep) BAMs, and this is the only > failure. > > ? Any suggestions on what might be wrong and how to fix it will be > > most welcome! > > > > Thanks > > John > > > > -- > > John Broxholme > > Wellcome Trust Centre for Human Genetics > > Roosevelt Drive, Oxford, OX3 7BN, UK > > > > > > > > > ------------------------------------------------------------------------------ > > Try New Relic Now & We'll Send You this Cool Shirt > > New Relic is the only SaaS-based application performance > monitoring service > > that delivers powerful full stack analytics. Optimize and > monitor your > > browser, app, & servers with just a few lines of code. Try New Relic > > and get this awesome Nerd Life shirt! > http://p.sf.net/sfu/newrelic_d2d_may > > > > > > _______________________________________________ > > svtoolkit-help mailing list > > svt...@li... > <mailto:svt...@li...> > > https://lists.sourceforge.net/lists/listinfo/svtoolkit-help > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 3 > Date: Tue, 28 May 2013 15:37:18 +0100 > From: John Broxholme <jo...@we... > <mailto:jo...@we...>> > Subject: Re: [svtoolkit-help] "Invalid sequence position" > To: svt...@li... > <mailto:svt...@li...> > Cc: Linda Hughes <li...@we... <mailto:li...@we...>> > Message-ID: >     > <CAD...@ma... <mailto:CADGtfyCYL1eH1oyhr5s-vzN7mMFiTuDRdEvi0A9V9%2Bj...@ma...>> > Content-Type: text/plain; charset="utf-8" > > Hi Bob, > > Thanks for the quick response.  Yes it is NCBI build37 (actually > the 1000G > reference with decoy).  I can fill in a bit more detail on this. >  The error > I sent earlier was using svtoolkit build 1.04.1068 (which I > couldn't use at > first since dependency on some new(?) R package required an > upgrade of R). >  I first saw this using our production version, build 1.04.857: > > ... > INFO  11:41:09,579 ComputeGCProfiles - Opened reference GC profile. > INFO  11:41:09,579 ComputeGCProfiles - Processing input file > /gpfs1/well/htseq/ILLUMINA-WGS/SV-Freeze5/AS_CLL_156GL.bam ... > Exception in thread "main" java.lang.RuntimeException: Invalid > sequence > position: 17:81195216 >     at > org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:40) >     at > org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:221) >     at > org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:23) >     at > org.broadinstitute.sv.apps.ComputeGCProfiles.main(ComputeGCProfiles.java:104) > Caused by: java.lang.IllegalArgumentException: Invalid sequence > position: > 17:81195216 > ... > I note that the invalid coordinate reported differs - 81195216 vs > 81195230, > although this is the same BAM. > > Anyhow, I have since updated picard-tools to 1.92 and run 'picard > validateSamFile on the offending BAM > The only error (in 10k errors) I see is "Mate not found for paired > read" (I > see many of these) which I assume has been caused by deduping (I used > picard for this). > Meanwhile I have removed reads mapping to the last 100bp of chr17 > from that > BAM, which is an ugly fix but it allows me to progress a bit. > And I have to resolve something else causing crashes with the current > (1.04.1068) version, which would be the subject of another thread... > > John > > On Thu, May 23, 2013 at 3:21 PM, John Broxholme > <jo...@we... <mailto:jo...@we...>> wrote: > > > Pre-processing of one(of 270+) deep BAM files has failed with: > > > > ... > > INFO  12:54:38,366 ComputeGCProfiles - Processing input file > > org.broadinstitute.sv.dataset.SAMFileLocation@986cff74 ... > > Exception in thread "main" java.lang.RuntimeException: Invalid > sequence > > position: 17:81195230 > > ... > > > > Where would this have come from?  The pipeline has been the same to > > prepare all 270+ of the (25x deep) BAMs, and this is the only > failure.  Any > > suggestions on what might be wrong and how to fix it will be > most welcome! > > > > Thanks > > John > > > > -- > > John Broxholme > > Wellcome Trust Centre for Human Genetics > > Roosevelt Drive, Oxford, OX3 7BN, UK > > > > > > > -- > John Broxholme > Wellcome Trust Centre for Human Genetics > Roosevelt Drive, Oxford, OX3 7BN, UK > Tel: (+44 1865) 287611 FAX: 287697 > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Tue, 28 May 2013 16:23:48 -0400 > From: Bob Handsaker <han...@br... > <mailto:han...@br...>> > Subject: Re: [svtoolkit-help] "Invalid sequence position" > To: svt...@li... > <mailto:svt...@li...> > Message-ID: <51A...@br... > <mailto:51A...@br...>> > Content-Type: text/plain; charset="iso-8859-1" > > This doesn't look like the entire stack trace.  Did you truncate it? > I'm happy to try to help, but you need to send me the full stack trace > and you should send the full command line as well. > -Bob > > On 5/28/13 10:37 AM, John Broxholme wrote: > > Hi Bob, > > > > Thanks for the quick response. ? Yes it is NCBI build37 > (actually the > > 1000G reference with decoy). ? I can fill in a bit more detail on > > this. ? The error I sent earlier was using svtoolkit build 1.04.1068 > > (which I couldn't use at first since dependency on some new(?) R > > package required an upgrade of R). ? I first saw this using our > > production version, build 1.04.857: > > > > ... > > INFO ? 11:41:09,579 ComputeGCProfiles - Opened reference GC > profile.? > > INFO ? 11:41:09,579 ComputeGCProfiles - Processing input file > > /gpfs1/well/htseq/ILLUMINA-WGS/SV-Freeze5/AS_CLL_156GL.bam ...? > > Exception in thread "main" java.lang.RuntimeException: Invalid > > sequence position: 17:81195216 > > ?  ?  ?  ?  at > > > org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:40) > > ?  ?  ?  ?  at > > > org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:221) > > ?  ?  ?  ?  at > > > org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:23) > > ?  ?  ?  ?  at > > > org.broadinstitute.sv.apps.ComputeGCProfiles.main(ComputeGCProfiles.java:104) > > Caused by: java.lang.IllegalArgumentException: Invalid sequence > > position: 17:81195216 > > ... > > I note that the invalid coordinate reported differs -? 81195216 > > vs? 81195230, although this is the same BAM. > > > > Anyhow, I have since updated picard-tools to 1.92 and run 'picard > > validateSamFile on the offending BAM > > The only error (in 10k errors) I see is "Mate not found for paired > > read" (I see many of these) which I assume has been caused by > deduping > > (I used picard for this). > > Meanwhile I have removed reads mapping to the last 100bp of > chr17 from > > that BAM, which is an ugly fix but it allows me to progress a bit. > > And I have to resolve something else causing crashes with the > current > > (1.04.1068) version, which would be the subject of another thread... > > > > John > > > > On Thu, May 23, 2013 at 3:21 PM, John Broxholme > <jo...@we... <mailto:jo...@we...> > > <mailto:jo...@we... <mailto:jo...@we...>>> wrote: > > > >   Pre-processing of one(of 270+) deep BAM files has failed with: > > > >   ... > >   INFO ? 12:54:38,366 ComputeGCProfiles - Processing input file > >   org.broadinstitute.sv.dataset.SAMFileLocation@986cff74 ...? > >   Exception in thread "main" java.lang.RuntimeException: Invalid > >   sequence position: 17:81195230 > >   ... > > > >   Where would this have come from? ? The pipeline has been the > >   same? to prepare all 270+ of the (25x deep) BAMs, and this > is the > >   only failure. ? Any suggestions on what might be wrong and > how to > >   fix it will be most welcome! > > > >   Thanks > >   John > > > >   -- > >   John Broxholme > >   Wellcome Trust Centre for Human Genetics > >   Roosevelt Drive, Oxford, OX3 7BN, UK > > > > > > > > > > -- > > John Broxholme > > Wellcome Trust Centre for Human Genetics > > Roosevelt Drive, Oxford, OX3 7BN, UK > > Tel: (+44 1865) 287611 <tel:%28%2B44%201865%29%20287611> FAX: 287697 > > > > > > > ------------------------------------------------------------------------------ > > Try New Relic Now & We'll Send You this Cool Shirt > > New Relic is the only SaaS-based application performance > monitoring service > > that delivers powerful full stack analytics. Optimize and > monitor your > > browser, app, & servers with just a few lines of code. Try New Relic > > and get this awesome Nerd Life shirt! > http://p.sf.net/sfu/newrelic_d2d_may > > > > > > _______________________________________________ > > svtoolkit-help mailing list > > svt...@li... > <mailto:svt...@li...> > > https://lists.sourceforge.net/lists/listinfo/svtoolkit-help > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Introducing AppDynamics Lite, a free troubleshooting tool for > Java/.NET > Get 100% visibility into your production application - at no cost. > Code-level diagnostics for performance bottlenecks with <2% overhead > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap1 > > ------------------------------ > > _______________________________________________ > svtoolkit-help mailing list > svt...@li... > <mailto:svt...@li...> > https://lists.sourceforge.net/lists/listinfo/svtoolkit-help > > > End of svtoolkit-help Digest, Vol 23, Issue 1 > ********************************************* > > > > > -- > John Broxholme > Wellcome Trust Centre for Human Genetics > Roosevelt Drive, Oxford, OX3 7BN, UK > Tel: (+44 1865) 287611 FAX: 287697 > > > ------------------------------------------------------------------------------ > Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET > Get 100% visibility into your production application - at no cost. > Code-level diagnostics for performance bottlenecks with <2% overhead > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap1 > > > _______________________________________________ > svtoolkit-help mailing list > svt...@li... > https://lists.sourceforge.net/lists/listinfo/svtoolkit-help |