Re: [svtoolkit-help] SVtoolkit question (ComputeGenomeMask)
Status: Beta
Brought to you by:
bhandsaker
From: Bob H. <han...@br...> - 2011-03-03 21:09:15
|
> *[svtoolkit-help] SVtoolkit question > <http://sourceforge.net/mailarchive/message.php?msg_id=27148621>* > From: Michael Parfenov <parfenov@ge...> - 2011-03-03 20:07 > Hi, > > Thanks for the SV toolkit! > Could you tell me, please, how to run the ComputeGenomeMask utility? I see > required arguments in the wiki page, but how to invoke the utility? > > best regards, > Michael. > Hi, Michael, ComputeGenomeMask is a regular java command line program and is in SVToolkit.jar, so basically you just need to do "java -cp SVToolkit.jar:GenomeAnalysisTK.jar org.broadinstitute.sv.apps.ComputeGenomeMask". But there are a couple of wrinkles with ComputeGenomeMask in particular. 1. You need to index the reference for bwa alignment. 2. ComputeGenomeMask uses libbwa.so, so you need that on your library path. 3. To do any reasonably large genome, you need to run it in parallel and then merge. Here is a shell script I used to do a 75bp mask for hg19 that you can use as an example. It's designed to use the local lsf environment at the Broad and knows where to find the reference fasta locally, etc. It also assumes the reference has already been indexed with "samtools faidx". The script just runs the parallel jobs, it doesn't merge the results back together (you will need to cat together the resulting fasta files in the same order as the reference). At some point I will likely write a queue script to automate this, but it's not a high priority right now because I'm also working on some changes to the mask file format. Hope this helps, -Bob #!/bin/bash outdir=Homo_sapiens_assembly19/75 readLength=75 reference=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta export SV_DIR=/humgen/cnp04/bobh/sv/stable # These executables must be on your path. which java > /dev/null || exit 1 which bwa > /dev/null || exit 1 export LD_LIBRARY_PATH=${SV_DIR}/bwa/lib:${LD_LIBRARY_PATH} classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar" mkdir -p ${outdir}/work localReference=${outdir}/work/`echo ${reference} | awk -F / '{ print $NF }'` if [ ! -e ${localReference} ]; then (cd ${outdir}/work && ln -s ${reference} .) || exit 1 (cd ${outdir}/work && ln -s ${reference}.fai .) || exit 1 fi bwa index -a bwtsw ${localReference} || exit 1 chroms=`cat ${localReference}.fai | cut -f 1` for chr in ${chroms}; do bsub -o ${outdir}/work/svmask_${chr}.log \ -R "rusage[mem=5000]" \ java -cp ${classpath} -Xmx4g \ org.broadinstitute.sv.apps.ComputeGenomeMask \ -R ${localReference} \ -O ${outdir}/work/svmask_${chr}.fasta \ -readLength ${readLength} \ -sequence ${chr} \ || exit 1 done |