Re: [svtoolkit-help] Genome strip - Unrecognized sequence: 1:0-0
Status: Beta
Brought to you by:
bhandsaker
From: Bob H. <han...@br...> - 2011-10-28 13:50:06
|
Queue is a very nice job management library written in scala by Khalid Shakir, part of the GATK. It's kind of like "make" in that you define a tree of jobs with dependencies and then queue will run these as parallel jobs when possible, based on the dependency graph. Queue supports LSF and I believe also SGE (although I haven't used this myself) and can also just run jobs synchronously in subshells. If you use -bsub and LSF, for example, then the jobs will be run in parallel when possible. This is all process-level parallelism, not thread-level. Genome STRiP is written in java and is single threaded, so the number of OS threads will depend on the JVM configuration. It's designed to exploit process-level parallelism. -Bob On 10/24/11 7:14 PM, Ashish Kumar wrote: > > Thanks Bob. > > On the parallelisation issue, could you please clarify more. > > 1.When this --windowSize is for e.g. 3 Mb, then does each window run > in parallel or that's only for chunking, and we could optionally make > them run separately and join the outputs later? Is it same for the > SVGenotyper's -parallelJobs option? > > 2.On various runs, I've noticed that the no. of optimal cores required > by the program on a multi-core architecture is 2 cores. Is this > correct or the program can use more cores on a 8-core node in some > different settings? > > Best, > > Ashish. > > *From:*Bob Handsaker [mailto:han...@br...] > *Sent:* 21 October 2011 16:49 > *To:* Ashish Kumar > *Cc:* svt...@li... > *Subject:* Re: [svtoolkit-help] Genome strip - Unrecognized sequence: > 1:0-0 > > Yes, you can use -L to process just a specific interval (or can pass a > file with a list of intervals - the file extension must be .list). > Note that the SVDiscovery queue script also does parallelization > internally, based on the -windowSize parameter. > For example, here are typical parameters to process the genome (or > just the intervals selected with -L) in 3Mb windows for events between > 100bp and 100Kb: > > -windowSize 3000000 > -windowPadding 100000 > -minimumSize 100 > -maximumSize 100000 > > You can invoke the queue script without '-run' to preview the chunking. > > -Bob > > On 10/21/11 11:30 AM, Ashish Kumar wrote: > > Hi Bob, > > On the same issue, if we want to use the --L option, would it be safe > to presume that we can chunk up the chromosomes. > > So, say something like "-L 20:1250000-2500000" would be a valid > option, assuming that this sequence exists in my reference genome? > > Thanks, > > Ashish > > *From:*Bob Handsaker [mailto:han...@br...] > *Sent:* 06 October 2011 14:40 > *To:* svt...@li... > <mailto:svt...@li...> > *Subject:* Re: [svtoolkit-help] Genome strip - Unrecognized sequence: > 1:0-0 > > This is because the example script uses "-L 1" (only process > chromosome 1) to make it faster, > but you likely don't have a sequence named "1" in your reference genome. > To process the whole genome, simply remove the -L argument. > -Bob > > On 10/5/11 2:50 PM, Axel Ericsson wrote: > > Hi > > I have get the following error message when I run the Genome strip: I > forwarded the modified shell script, hope you could point me in the > right direction. > > Best regards Axel > > INFO 14:37:52,013 QScriptManager - Compiling 2 QScripts > > INFO 14:37:59,172 QScriptManager - Compilation complete > > INFO 14:38:02,654 HelpFormatter - > --------------------------------------------------------- > > INFO 14:38:02,655 HelpFormatter - Program Name: > org.broadinstitute.sting.queue.QCommandLine > > INFO 14:38:02,655 HelpFormatter - Program Args: -S > /seq/vsag/axel/tools/Genome_strip/svtoolkit/qscript/SVDiscovery.q -S > /seq/vsag/axel/tools/Genome_strip/svtoolkit/qscript/SVQScript.q -gatk > /seq/vsag/axel/tools/Genome_strip/svtoolkit/lib/gatk/GenomeAnalysisTK.jar > -cp > /seq/vsag/axel/tools/Genome_strip/svtoolkit/lib/SVToolkit.jar:/seq/vsag/axel/tools/Genome_strip/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/seq/vsag/axel/tools/Genome_strip/svtoolkit/lib/gatk/Queue.jar > -configFile conf/genstrip_HCSMA_parameters.txt -tempDir ./tmpdir -R > /seq/references/Canis_lupus_familiaris_assembly2/v0/Canis_lupus_familiaris_assembly2.fasta > -genomeMaskFile > /seq/vsag/hyunji/CCD/capture/genomestrip/canFam2_1_index/work/Canis_lupus_familiaris_assembly2.mask.fasta > -genderMapFile data/HCSMA_gender.map -runDirectory HCSMA -md > HCSMA/metadata -jobLogDir HCSMA/logs -L 1 -minimumSize 100 > -maximumSize 1000000 -I > /seq/vsag/axel/bamfiles/HCSMCRealignment.HCSMA_B90_Homo_1.clean.dedup.recal.bam > -O HCSMA.discovery.vcf -run > > INFO 14:38:02,656 HelpFormatter - Date/Time: 2011/10/05 14:38:02 > > INFO 14:38:02,656 HelpFormatter - > --------------------------------------------------------- > > INFO 14:38:02,657 HelpFormatter - > --------------------------------------------------------- > > INFO 14:38:02,661 QCommandLine - Scripting SVDiscovery > > ##### ERROR > ------------------------------------------------------------------------------------------ > > ##### ERROR stack trace > > java.lang.IllegalArgumentException:*Unrecognized sequence: 1:0-0* > > at > org.broadinstitute.sv.queue.ComputeDiscoveryPartitions.computePartitions(ComputeDiscoveryPartitions.java:96) > > at > org.broadinstitute.sv.qscript.SVQScript.computeDiscoveryPartitions(SVQScript.q:132) > > at SVDiscovery.script(SVDiscovery.q:19) > > at > org.broadinstitute.sting.queue.QCommandLine$$anonfun$execute$1.apply(QCommandLine.scala:46) > > at > org.broadinstitute.sting.queue.QCommandLine$$anonfun$execute$1.apply(QCommandLine.scala:43) > > at scala.collection.Iterator$class.foreach(Iterator.scala:631) > > at > scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:549) > > at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) > > at > scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:596) > > at > org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:43) > > at > org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:239) > > at > org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:117) > > at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala) > > ##### ERROR > ------------------------------------------------------------------------------------------ > > ##### ERROR A GATK RUNTIME ERROR has occurred (version 1.0.5039M): > > ##### ERROR > > ##### ERROR Please visit to wiki to see if this is a known problem > > ##### ERROR If not, please post the error, with stack trace, to the > GATK forum > > ##### ERROR Visit our wiki for extensive documentation > http://www.broadinstitute.org/gsa/wiki > > ##### ERROR Visit our forum to view answers to commonly asked > questions http://getsatisfaction.com/gsa > > ##### ERROR > > ##### ERROR MESSAGE: Unrecognized sequence: 1:0-0 > > ##### ERROR > ------------------------------------------------------------------------------------------ > > > > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > > > > > > _______________________________________________ > svtoolkit-help mailing list > svt...@li... <mailto:svt...@li...> > https://lists.sourceforge.net/lists/listinfo/svtoolkit-help > |