Re: [svtoolkit-help] Genome STRiP / cannot determine library identifier
Status: Beta
Brought to you by:
bhandsaker
From: Bob H. <han...@br...> - 2011-02-16 14:40:21
|
Hi, Mingfu, I'm going to copy this to the support mailing list too. I suspect the problem is that your @RG (read group) headers do not contain a LB (library) tag. Either that or the RG tags are missing from the reads. As a result, Genome STRiP can't group the reads into libraries. Is it possible for you to reheader your bam files? -Bob On 2/15/11 11:32 PM, Mingfu Zhu wrote: > Hi Bob, > > I input a file with path of bam files (along with a list of genders). > It seems > running now. The screen log says > > Web site: http://www.broadinstitute.org/gsa/wiki/index.php/Genome_STRiP > INFO 23:05:10,551 QScriptManager - Compiling 2 QScripts > INFO 23:05:15,793 QScriptManager - Compilation complete > INFO 23:05:18,426 HelpFormatter - > --------------------------------------------------------- > INFO 23:05:18,426 HelpFormatter - Program Name: > org.broadinstitute.sting.queue.QCommandLine > INFO 23:05:18,427 HelpFormatter - Program Args: -S > /nfs/seqsata07/Mingfu/svtoolkit/qscript/SVPreprocess.q -S > /nfs/seqsata07/Mingfu/svtoolkit/qscript/SVQScript.q -gatk > /nfs/seqsata07/Mingfu/svtoolkit/lib/gatk/GenomeAnalysisTK.jar -cp > /nfs/seqsata07/Mingfu/svtoolkit/lib/SVToolkit.jar:/nfs/seqsata07/Mingfu/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nfs/seqsata07/Mingfu/svtoolkit/lib/gatk/Queue.jar > > -configFile conf/genstrip_installtest_parameters.txt -tempDir > /nfs/seqsata07/Mingfu/svtoolkit/tmpdir -R > /nfs/seqsata07/Mingfu/svtoolkit/data/human_ref_36_50.fa -genomeMaskFile > /nfs/seqsata07/Mingfu/svtoolkit/data/Homo_sapiens_assembly18.mask.101.fasta > > -genderMapFile /nfs/seqsata07/Mingfu/svtoolkit/data/sample.gender > -runDirectory > /nfs/seqsata07/Mingfu/svtoolkit/test1 -md > /nfs/seqsata07/Mingfu/svtoolkit/test1/metadata -jobLogDir > /nfs/seqsata07/Mingfu/svtoolkit/test1/logs -I > /nfs/seqsata07/Mingfu/svtoolkit/data/sample.list -run > INFO 23:05:18,427 HelpFormatter - Date/Time: 2011/02/15 23:05:18 > INFO 23:05:18,427 HelpFormatter - > --------------------------------------------------------- > INFO 23:05:18,427 HelpFormatter - > --------------------------------------------------------- > INFO 23:05:18,428 QCommandLine - Scripting SVPreprocess > INFO 23:05:18,517 QCommandLine - Added 7 functions > INFO 23:05:18,518 QGraph - Generating graph. > INFO 23:05:18,541 QGraph - Running jobs. > INFO 23:05:18,604 ShellJobRunner - Starting: java -Xmx4g > -Djava.io.tmpdir=/nfs/seqsata07/Mingfu/svtoolkit/tmpdir -cp > /nfs/seqsata07/Mingfu/svtoolkit/lib/SVToolkit.jar:/nfs/seqsata07/Mingfu/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nfs/seqsata07/Mingfu/svtoolkit/lib/gatk/Queue.jar > > org.broadinstitute.sting.gatk.CommandLineGATK -T > ComputeInsertSizeDistributions > -R /nfs/seqsata07/Mingfu/svtoolkit/data/human_ref_36_50.fa -I > /nfs/seqsata07/Mingfu/svtoolkit/data/sample.list -O > /nfs/seqsata07/Mingfu/svtoolkit/test1/metadata/isd/sample.list.hist.bin -md > > /nfs/seqsata07/Mingfu/svtoolkit/test1/metadata -createEmpty > INFO 23:05:18,604 ShellJobRunner - Output written to > /nfs/seqsata07/Mingfu/svtoolkit/test1/logs/Q-23740@sva-1.out > > > The I checked the output file Q-23740@sva-1.out. It says > > > INFO 23:05:22,562 HelpFormatter - > --------------------------------------------------------------------------- > > INFO 23:05:22,564 HelpFormatter - The Genome Analysis Toolkit (GATK) > v1.0.5039M, Compiled 2011/01/20 22:58:34 > INFO 23:05:22,564 HelpFormatter - Copyright (c) 2010 The Broad Institute > INFO 23:05:22,565 HelpFormatter - Please view our documentation at > http://www.broadinstitute.org/gsa/wiki > INFO 23:05:22,565 HelpFormatter - For support, please view our > support site at > http://getsatisfaction.com/gsa > INFO 23:05:22,565 HelpFormatter - Program Args: -T > ComputeInsertSizeDistributions -R /nfs/seqsata07/Mingfu/svtoolkit/data/hu > man_ref_36_50.fa -I /nfs/seqsata07/Mingfu/svtoolkit/data/sample.list -O > /nfs/seqsata07/Mingfu/svtoolkit/test1/metadata/isd/sa > mple.list.hist.bin -md /nfs/seqsata07/Mingfu/svtoolkit/test1/metadata > -createEmpty > INFO 23:05:22,565 HelpFormatter - Date/Time: 2011/02/15 23:05:22 > INFO 23:05:22,565 HelpFormatter - > --------------------------------------------------------------------------- > > INFO 23:05:22,565 HelpFormatter - > --------------------------------------------------------------------------- > > INFO 23:05:22,570 GenomeAnalysisEngine - Strictness is SILENT > Error: Cannot determine library identifier for read ERR001698.1039047 > INFO 23:05:25,706 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL > STARTING] > INFO 23:05:25,707 TraversalEngine - Location processed.reads > runtime > per.1M.reads completed total.runtime remaining > Error: Cannot determine library identifier for read ERR001699.2594249 > Error: Cannot determine library identifier for read ERR001699.7228308 > Error: Cannot determine library identifier for read ERR001705.1973559 > Error: Cannot determine library identifier for read ERR001705.5344781 > Error: Cannot determine library identifier for read ERR001706.368556 > Error: Cannot determine library identifier for read ERR001706.880536 > Error: Cannot determine library identifier for read ERR001706.5437519 > Error: Cannot determine library identifier for read ERR001710.3090891 > Error: Cannot determine library identifier for read ERR001710.4326898 > Error: Cannot determine library identifier for read ERR001712.4452569 > > > Is this expected? I killed a job earlier because it generated 100G of > such log. > Given 20 samples with 40X coverage each, how many hours do you > estimate it > takes? Can it run in cluster by some way? > > Thanks, > Mingfu > > |