Menu

#338 PBcR pipeline stops on grid engine without errors

overlapper
open
nobody
5
2016-02-08
2016-02-01
Alex Bod
No

Hello wgs-assembler team,

I am working with PBcR of wgs 8.3 version.

My data set is 160Gb of ~55x coverage of a ~1.5Gb plant genome. Moderate to high repeat content. Uncorrected pacbio data.

I am trying to use the grid engine method to assemble this genome, as running it on a single machine is too slow, of course. MHAP for correct.

I have a 512Gb memory machine with 24 CPU as my main node. The other nodes are comparable.

The cluster implements a sun grid engine + MPI.

Attached is the spec file I am using - a slightly modified version of the spec file on the PBcR page for grid engine. (wgs-assembler.sourceforge.net)

Using this spec file and the command (qsubed from an .sh file):
PBcR -l Genome -s pacbio.spec -fastq Gemome.fastq genomeSize=1500000000
results in this error-less and result-less ending in the main log file:

/home/myname/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -dumpfragments -tabular asm.gkpStore |awk '{print $1"\t"$2}' > asm.eidToIID
----------------------------------------END Wed Jan 27 21:23:58 2016 (11 seconds)
----------------------------------------START Wed Jan 27 21:23:58 2016
/home/myname/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -dumpfragments -tabular asm.gkpStore |awk '{print $2"\t"$10}' > asm.iidToLen
----------------------------------------END Wed Jan 27 21:24:08 2016 (10 seconds)
----------------------------------------START Wed Jan 27 21:24:08 2016
qsub  -pe smp 15 -l h_vmem=2G -cwd -N "pBcR_ovlprep_asm" -t 1-143 -j y -o /dev/null /opt/gc/old/projects/myname/Genome/55x_corrected_pacbio/tempGenome/1-overlapper/ovlprep.sh
Your job-array 201708.1-143:1 ("pBcR_ovlprep_asm") has been submitted
----------------------------------------END Wed Jan 27 21:24:08 2016 (0 seconds)
----------------------------------------START Wed Jan 27 21:24:08 2016
qsub  -pe smp 1 -cwd -N "pBcR_asm" -hold_jid "pBcR_ovlprep_asm" -j y -o opt/gc/old/projects/myname/Genome/55x_corrected_pacbio/tempGenome/runPBcR.sge.out.00 /opt/gc/old/projects/myname/Genome/55x_corrected_pacbio/tempGenome/runPBcR.sge.out.00.sh
Your job 201709 ("pBcR_asm") has been submitted
----------------------------------------END Wed Jan 27 21:24:08 2016 (0 seconds)

Using a similar approach for the e coli K12 sample data downloaded on the
PBcR page has the same log file ending and lack of assembly (no 9-terminator directory). Ths conclusion I draw is that it is not a data problem as K12 is perfectly assembler with its normal non grid spec file.

Is this a specification file proble?
A grid engine setting problem?
Could you redirect me towards a sample data using grid engine supposed to work on a sge?

Thanks in advance,
Alex B

1 Attachments

Related

Bugs: #338

Discussion

  • Sergey Koren

    Sergey Koren - 2016-02-01

    This behavior is correct when running on the grid. Once the job is submitted to the grid, the job quits on the head node to avoid blocking the user terminal and the rest of the pipeline will run submitting jobs until it completes. You can view jobs running by using qsub.

    I'll also suggest using Canu (https://github.com/marbl/canu) instead of PBcR

     
    • Alex Bod

      Alex Bod - 2016-02-02

      Hello Sergey,
      Thanks for your reply.
      Tough, when trying to view the running jobs using qstat, no jobs are displayed after the pipeline reaches this main node quitting step.

      I will give Canu a try. Can I ask why you would suggest using it rather than PBcR?

      -Alex B

       
  • Alex Bod

    Alex Bod - 2016-02-02
     

    Last edit: Alex Bod 2016-02-02
  • Sergey Koren

    Sergey Koren - 2016-02-08

    If no jobs are running when you check qsub, then it may have completed with an error, check the output in tempGenome/runPBcR.sge.out.0*

    Canu is the replacement for PBcR with a streamlined workflow, support for more grid engines, and auto-detection of grid resources. Going forward, we will not be making bug fixes to PBcR, only Canu.

     
    • Alex Bod

      Alex Bod - 2016-02-24

      Thanks, I switched to canu now and obtained assemblies on grid engies slurn and sge as well.
      Do you plan on not supporting PBcR only or the whole wgs package?

       
      • Sergey Koren

        Sergey Koren - 2016-02-24

        We plan on not supporting the entire WGS package. The relevant parts have been integrated into Canu but algorithm improvements and bug fixes will only go into Canu, not WGS.

        On Feb 24, 2016, at 8:56 AM, Alex Bod dasalex@users.sf.net wrote:

        Thanks, I switched to canu now and obtained assemblies on grid engies slurn and sge as well.
        Do you plan on not supporting PBcR only or the whole wgs package?

        [bugs:#338] http://sourceforge.net/p/wgs-assembler/bugs/338/ PBcR pipeline stops on grid engine without errors

        Status: open
        Group: overlapper
        Labels: grid engine large genome
        Created: Mon Feb 01, 2016 10:54 AM UTC by Alex Bod
        Last Updated: Mon Feb 08, 2016 09:09 PM UTC
        Owner: nobody
        Attachments:

        pacbio.spec https://sourceforge.net/p/wgs-assembler/bugs/338/attachment/pacbio.spec (821 Bytes; text/x-rpm-spec)
        Hello wgs-assembler team,

        I am working with PBcR of wgs 8.3 version.

        My data set is 160Gb of ~55x coverage of a ~1.5Gb plant genome. Moderate to high repeat content. Uncorrected pacbio data.

        I am trying to use the grid engine method to assemble this genome, as running it on a single machine is too slow, of course. MHAP for correct.

        I have a 512Gb memory machine with 24 CPU as my main node. The other nodes are comparable.

        The cluster implements a sun grid engine + MPI.

        Attached is the spec file I am using - a slightly modified version of the spec file on the PBcR page for grid engine. http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Running_on_a_grid_system
        Using this spec file and the command (qsubed from an .sh file):
        PBcR -l Genome -s pacbio.spec -fastq Gemome.fastq genomeSize=1500000000
        results in this error-less and result-less ending in the main log file:

        /home/myname/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -dumpfragments -tabular asm.gkpStore |awk '{print $1"\t"$2}' > asm.eidToIID
        ----------------------------------------END Wed Jan 27 21:23:58 2016 (11 seconds)
        ----------------------------------------START Wed Jan 27 21:23:58 2016
        /home/myname/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -dumpfragments -tabular asm.gkpStore |awk '{print $2"\t"$10}' > asm.iidToLen
        ----------------------------------------END Wed Jan 27 21:24:08 2016 (10 seconds)
        ----------------------------------------START Wed Jan 27 21:24:08 2016
        qsub -pe smp 15 -l h_vmem=2G -cwd -N "pBcR_ovlprep_asm" -t 1-143 -j y -o /dev/null /opt/gc/old/projects/myname/Genome/55x_corrected_pacbio/tempGenome/1-overlapper/ovlprep.sh
        Your job-array 201708.1-143:1 ("pBcR_ovlprep_asm") has been submitted
        ----------------------------------------END Wed Jan 27 21:24:08 2016 (0 seconds)
        ----------------------------------------START Wed Jan 27 21:24:08 2016
        qsub -pe smp 1 -cwd -N "pBcR_asm" -hold_jid "pBcR_ovlprep_asm" -j y -o opt/gc/old/projects/myname/Genome/55x_corrected_pacbio/tempGenome/runPBcR.sge.out.00 /opt/gc/old/projects/myname/Genome/55x_corrected_pacbio/tempGenome/runPBcR.sge.out.00.sh
        Your job 201709 ("pBcR_asm") has been submitted
        ----------------------------------------END Wed Jan 27 21:24:08 2016 (0 seconds)
        Using a similar approach for the e coli K12 sample data downloaded on the
        PBcR page http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Assembling_an_E._coli has the same log file ending and lack of assembly (no 9-terminator directory). Ths conclusion I draw is that it is not a data problem as K12 is perfectly assembler with its normal non grid spec file.

        Is this a specification file proble?
        A grid engine setting problem?
        Could you redirect me towards a sample data using grid engine supposed to work on a sge?

        Thanks in advance,
        Alex B

        Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/wgs-assembler/bugs/338/ https://sourceforge.net/p/wgs-assembler/bugs/338/
        To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ https://sourceforge.net/auth/subscriptions/

         

        Related

        Bugs: #338


Log in to post a comment.

MongoDB Logo MongoDB