Hello wgs-assembler team,
I am working with PBcR of wgs 8.3 version.
My data set is 160Gb of ~55x coverage of a ~1.5Gb plant genome. Moderate to high repeat content. Uncorrected pacbio data.
I am trying to use the grid engine method to assemble this genome, as running it on a single machine is too slow, of course. MHAP for correct.
I have a 512Gb memory machine with 24 CPU as my main node. The other nodes are comparable.
The cluster implements a sun grid engine + MPI.
Attached is the spec file I am using - a slightly modified version of the spec file on the PBcR page for grid engine. (wgs-assembler.sourceforge.net)
Using this spec file and the command (qsubed from an .sh file):
PBcR -l Genome -s pacbio.spec -fastq Gemome.fastq genomeSize=1500000000
results in this error-less and result-less ending in the main log file:
/home/myname/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -dumpfragments -tabular asm.gkpStore |awk '{print $1"\t"$2}' > asm.eidToIID
----------------------------------------END Wed Jan 27 21:23:58 2016 (11 seconds)
----------------------------------------START Wed Jan 27 21:23:58 2016
/home/myname/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -dumpfragments -tabular asm.gkpStore |awk '{print $2"\t"$10}' > asm.iidToLen
----------------------------------------END Wed Jan 27 21:24:08 2016 (10 seconds)
----------------------------------------START Wed Jan 27 21:24:08 2016
qsub -pe smp 15 -l h_vmem=2G -cwd -N "pBcR_ovlprep_asm" -t 1-143 -j y -o /dev/null /opt/gc/old/projects/myname/Genome/55x_corrected_pacbio/tempGenome/1-overlapper/ovlprep.sh
Your job-array 201708.1-143:1 ("pBcR_ovlprep_asm") has been submitted
----------------------------------------END Wed Jan 27 21:24:08 2016 (0 seconds)
----------------------------------------START Wed Jan 27 21:24:08 2016
qsub -pe smp 1 -cwd -N "pBcR_asm" -hold_jid "pBcR_ovlprep_asm" -j y -o opt/gc/old/projects/myname/Genome/55x_corrected_pacbio/tempGenome/runPBcR.sge.out.00 /opt/gc/old/projects/myname/Genome/55x_corrected_pacbio/tempGenome/runPBcR.sge.out.00.sh
Your job 201709 ("pBcR_asm") has been submitted
----------------------------------------END Wed Jan 27 21:24:08 2016 (0 seconds)
Using a similar approach for the e coli K12 sample data downloaded on the
PBcR page has the same log file ending and lack of assembly (no 9-terminator directory). Ths conclusion I draw is that it is not a data problem as K12 is perfectly assembler with its normal non grid spec file.
Is this a specification file proble?
A grid engine setting problem?
Could you redirect me towards a sample data using grid engine supposed to work on a sge?
Thanks in advance,
Alex B
This behavior is correct when running on the grid. Once the job is submitted to the grid, the job quits on the head node to avoid blocking the user terminal and the rest of the pipeline will run submitting jobs until it completes. You can view jobs running by using qsub.
I'll also suggest using Canu (https://github.com/marbl/canu) instead of PBcR
Hello Sergey,
Thanks for your reply.
Tough, when trying to view the running jobs using qstat, no jobs are displayed after the pipeline reaches this main node quitting step.
I will give Canu a try. Can I ask why you would suggest using it rather than PBcR?
-Alex B
Last edit: Alex Bod 2016-02-02
If no jobs are running when you check qsub, then it may have completed with an error, check the output in tempGenome/runPBcR.sge.out.0*
Canu is the replacement for PBcR with a streamlined workflow, support for more grid engines, and auto-detection of grid resources. Going forward, we will not be making bug fixes to PBcR, only Canu.
Thanks, I switched to canu now and obtained assemblies on grid engies slurn and sge as well.
Do you plan on not supporting PBcR only or the whole wgs package?
We plan on not supporting the entire WGS package. The relevant parts have been integrated into Canu but algorithm improvements and bug fixes will only go into Canu, not WGS.
Related
Bugs: #338