Menu

#294 Error: meryl failed, too many files

consensus
closed-out-of-date
None
5
2015-05-18
2015-02-03
astacus
No

Hello,

I am trying to correct PacBio reads with Illumina reads using
the command:

PBcR -length 500 -partitions 200 -l onc -s pacbio.SGE.spec -fastq \ filtered_subreads.fastq genomeSize=1200000000 illumina180.frg illumina500.frg \ illumina3000.frg illumina10000.frg

but get an
"meryl failed" message:


...
Segment 5132 finished.
Thread exits.
Threads all done, cleaning up.
Merge results.
bitPackedFile::bitPackedFile()-- failed to open '/opt/gc/project_data/nonLIMS_projects/BioInfo/hybrid_assembly_test/onc_test/temponc/0-mercounts/asm-C-ms14-cm0.batch510.mcdat': Too many open files
Can fit 22937600 mers into table with prefix of 20 bits, using 25.000MB ( 0.000MB for positions)

Failure message:

meryl failed


The pacbio.SGE.spec looks like this:


merSize=14

useGrid = 1
scriptOnGrid = 1
frgCorrOnGrid = 1
ovlCorrOnGrid = 1

sge = -A assembly
sgeScript = -pe smp 16
sgeConsensus = -pe smp 1
sgeOverlap = -pe smp 16
sgeFragmentCorrection = -pe smp 2
sgeOverlapCorrection = -pe smp 1

ovlThreads = 16


Any ideas what I could change?

Astacus

Discussion

  • Brian Walenz

    Brian Walenz - 2015-02-06

    Set merylMemory (in MB) to something much bigger. It looks like it auto-detected 32 cores, and used the default 800mb to then run chunks of size 800/32=25mb.

    Based on the report of 5132 segments, and that you failed at segment 510, the minimum size that will work will be slightly more than 800*10 = 8gb. At least 12000 is suggested.

    Since you're asking for 'pe smp 16' you probably also want to set merylThreads=16.

    http://wgs-assembler.sourceforge.net/wiki/index.php/RunCA#Meryl

     
  • Brian Walenz

    Brian Walenz - 2015-02-06
    • status: open --> pending
    • assigned_to: Brian Walenz
     
  • astacus

    astacus - 2015-02-10

    Thanks for the help. Now the job seems to start but stops at the
    next step:

    qsub -A assembly -pe smp 16 -cwd -N "pBcR_asm" -hold_jid "pBcR_ovlprep_asm" -j y -o /opt/gc/project_data/nonLIMS_projects/BioInfo/hybrid_assembly_test/onc_test//temponc2/runPBcR.sge.out.00 /opt/gc/project_data/nonLIMS_projects/BioInfo/hybrid_assembly_test/onc_test//temponc2/runPBcR.sge.out.00.sh
    Your job 104714 ("pBcR_asm") has been submitted

    In the ouput file "runPBcR.sge.out.00 I see:

    Warning: no access to tty (Bad file descriptor).
    Thus no job control in this shell.
    if: Expression Syntax.
    then: Command not found.

    sigh
    astacus

     
    • Sergey Koren

      Sergey Koren - 2015-02-10

      This looks like the job is not running under the bash shell and so the script is not being interpreted properly. Your runPBcR.sge.out.00.sh should start with

      #!/bin/bash

      Does it, does /bin/bash exist on your system? Can you run the script by hand without error (i.e. if you just do sh runPBcR.sge.out.00.sh)

      [Bri trimmed quoted email]

       

      Last edit: Brian Walenz 2015-02-11
  • Brian Walenz

    Brian Walenz - 2015-02-11

    Yup, exactly that. SGE is, by default, using the csh shell to run scripts.

    Add qsub option "-b y" to the runCA 'sge' line. This will tell sge to treat the command as a binary, instead of a script, and in this case, it will then respect the '#!' interpreter.

    "-S /bin/bash" should also work, but I haven't tried it.

     
  • Sergey Koren

    Sergey Koren - 2015-05-18
    • status: pending --> closed-out-of-date
     

Log in to post a comment.