[svtoolkit-help] Genome Strip 2 using wrong bin size
Status: Beta
Brought to you by:
bhandsaker
|
From: Wakeling, M. <M.W...@ex...> - 2018-01-30 19:52:26
|
Hi.
I'm trying to run the LCNV detection pipeline from Genome Strip 2 on a set of 237 whole genome samples. I have performed the PreProcess and GenerateDepthProfiles stages, and am trying to run the LCNVDiscoveryPipeline stage. When I run queue, it reports that it has created 19993 jobs, however all the jobs then fail. For instance, the first job is:
INFO 08:31:37,845 FunctionEdge - Starting: 'Rscript' '/gpfs/ts0/home/mw501/Research_Project-MRC147594/genome_sequencing/genome_strip2/svtoolkit/R/lcnv/lcnv_scan.R' '--profileFile' 'profiles_10000/profile_seq_1_728.dat.gz' '--targetSample' 'WG0007' '--maxDepth' '60' '--ploidyMapFile' '/gpfs/ts0/home/mw501/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.ploidymap.txt' '--genderMapFile' 'output_metadata_directory/sample_gender.report.txt' '--outputFile' 'lcnv_output/seq_1/1_WG0007.dat'
and this fails because there is no file named profiles_10000/profile_seq_1_728.dat.gz - but there is a file named profiles_10000/profile_seq_1_10000.dat.gz. I'm not sure where it is getting the 728 from.
The arguments for the three stages are:
export SV_DIR=~/Research_Project-MRC147594/genome_sequencing/genome_strip2/svtoolkit
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
java -Xmx4g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVPreprocess.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile ${SV_DIR}/conf/genstrip_parameters.txt \
-R ~/Research_Project-MRC147594/resources/grch37/human_g1k_v37.fasta \
-I bams.list \
-md output_metadata_directory \
-bamFilesAreDisjoint true \
-copyNumberMaskFile ~/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.gcmask.fasta \
-genderMaskBedFile ~/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.gendermask.bed \
-genomeMaskFile ~/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta \
-ploidyMapFile ~/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.ploidymap.txt \
-readDepthMaskFile ~/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.rdmask.bed \
-jobLogDir logDir \
-jobRunner ParallelShell \
-maxConcurrentRun 16 \
-run
export SV_DIR=~/Research_Project-MRC147594/genome_sequencing/genome_strip2/svtoolkit
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
java -Xmx4g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/profiles/GenerateDepthProfiles.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-R ~/Research_Project-MRC147594/resources/grch37/human_g1k_v37.fasta \
-md output_metadata_directory \
-profileBinSize 10000 \
-maximumReferenceGapLength 1000 \
-runDirectory profiles_10000 \
-genomeMaskFile ~/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta \
-ploidyMapFile ~/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.ploidymap.txt \
-jobLogDir profiles_10000/logDir \
-jobRunner ParallelShell \
-maxConcurrentRun 16 \
-run
export SV_DIR=~/Research_Project-MRC147594/genome_sequencing/genome_strip2/svtoolkit
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
java -Xmx4g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/discovery/lcnv/LCNVDiscoveryPipeline.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-R ~/Research_Project-MRC147594/resources/grch37/human_g1k_v37.fasta \
-md output_metadata_directory \
-profilesDir profiles_10000 \
-runDirectory lcnv_output \
-maxDepth 60 \
-genomeMaskFile ~/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta \
-ploidyMapFile ~/Research_Project-MRC147594/resources/hg19/Homo_sapiens_assembly19/Homo_sapiens_assembly19.ploidymap.txt \
-genderMapFile output_metadata_directory/sample_gender.report.txt \
-jobLogDir lcnv_output/logDir \
-jobRunner ParallelShell \
-maxConcurrentRun 16 \
-run
Any assistance would be greatly appreciated.
Matthew
|