Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#256 Scaffolder segmentation fault

scaffolder
open
Brian Walenz
5
2013-10-29
2013-09-30
JeroenF@lumc
No

Hi,

I've run into a problem once more. Before I go into that I first want to thank you for your help the last couple of weeks. I am now using Celera Assembler version CVS TIP 2013-08-01 with success: having finished a pacBioToCA job and almost an assembly.

Now here's the issue: Celera Assembler crashes during the scaffolding stage, the core is dumped.

The cgw.out file contains the following error message: "Failed with 'Segmentation fault'".

runCA.sge.out holds the following error notion: "ERROR: Failed with signal ABRT (6)".

stdout reports: "ERROR: Failed with signal SEGV (11)".

Please refer to the attachments for more extensive error messages.

I've restarted the pipeline several times but the error is persistent.Is there any advice you can give me to find out what is going wrong? Thank you once again.

Best,

Jeroen

1 Attachments

Discussion

  • Brian Walenz
    Brian Walenz
    2013-09-30

    I'm going to guess that setting 'missing mate' back to the default of 0 will resolve the crash. I've seen one other crash with a similar signature that was fixed by doing this.

     
  • Brian Walenz
    Brian Walenz
    2013-09-30

    • assigned_to: Brian Walenz
     
  • JeroenF@lumc
    JeroenF@lumc
    2013-10-01

    Hi Brian,

    Thanks for your fast response.
    I've changed my spec file so that cgwMergeMissingThreshold = 0 and then restarted the pipeline. Unfortunately this did not resolve the error. The scaffolder still fails at exactly the same point (strange?) generating identical error messages. Do you have more ideas of what may cause this error?

    On a side note:
    Earlier during the run I noticed few consensus jobs did not finish successfully:


    MultiAlignStore::flushDisk()-- flushed 0 unitigs and 0 contigs.
    MultiAlignStore::flushCache()-- flushed 1 unitigs and 0 contigs.

    NumColumnsInUnitigs = 11027452
    NumGapsInUnitigs = 16302
    NumRunsOfGapsInUnitigReads = 1293381
    NumColumnsInContigs = 0
    NumGapsInContigs = 0
    NumRunsOfGapsInContigReads = 0
    NumAAMismatches = 0
    NumVARRecords = 0
    NumVARStringsWithFlankingGaps = 0
    NumUnitigRetrySuccess = 0

    WARNING: Total number of unitig failures = 1

    Consensus did NOT finish successfully.


    Removing these files and restarting the pipeline did not change a thing, nonetheless the pipeline progressed without any errors so I assumed it was fine/"normal behavior". Here's my "noob" question (just to be sure): could this contribute to the scaffolding error I'm experiencing now?

    Thanks!
    Jeroen

     
    Last edit: JeroenF@lumc 2013-10-01
  • Brian Walenz
    Brian Walenz
    2013-10-01

    CGW is extremely careful about checking input unitigs. They're OK. In 5-consensus, there should be a consensus-fix.out (depending on how recent your code is, this might be split into many files though). That will be catching the failures and fixing them.

    I won't be able to study the code for a bit. In the mean time, can you:

    1) attach much more of the cgw.out log? The last couple MB at least.
    2) recompile for debugging ('rm -rf Linux-amd64' then 'gmake BUILDDEBUG=1') and rerun. if you know how, in the debugger for a stack trace and maybe some variable dumps. If not, the backtraces reported might be enough.

     
  • JeroenF@lumc
    JeroenF@lumc
    2013-10-02

    I've attached the complete cgw.out file.
    I will try to post the stack trace/ variable dumps/ backtraces soon.
    As always, thanks for your help.

     
    Attachments
  • JeroenF@lumc
    JeroenF@lumc
    2013-10-16

    An update:

    I compiled Celera Assembler CVS TIP 2013-08-01 with debugging enabled.
    The scaffolder started working and crashed again at the same location.
    The core was dumped, you can download it here:

    https://barmsijs.lumc.nl/Celera_DEBUG_core_dump
    CAUTION: file size 125G

    I've also attached the cgw.out file again.
    I hope this will help trace the problem.

     
    Attachments
  • Jason Miller
    Jason Miller
    2013-10-21

    Your cgw.out file ends like this:

    CreateAContigInScaffold()-- new contig 29007413 in scaffold 10099
    KickOutNonOverlappingContig: Removing contig 28934106 to scaffold 66486 because we found no overlaps to it
    RecomputeOffsetsInScaffold() returned RECOMPUTE_DELETE_CONTIG on scaffold 10101; scaffold modified, keep trying
    Failed with 'Segmentation fault'

    The fact that "KickOutNonOverlapingContg" ran indicates you probably had this in your spec file:

    kickOutNonOvlContigs 0

    That runCA option turns on this cgw (scaffold) option:

    -K Allow kicking out a contig placed in a scaffold by mate pairs that has no overlaps to both its left and right neighbor contigs.

    While that option seems attractive, and has been recommended by some some CA developers, it may not have been tested extensively. Note it is is documented as an experts-only function:

    (Allow kicking out a contig placed in a scaffold by mate pairs that has no overlaps to both its left and right neighbor contigs. EXPERT!

    Could you try running without that option? We are anxious to hear the result. Thanks.

     
  • JeroenF@lumc
    JeroenF@lumc
    2013-10-29

    Thanks Brain, Sergey and Jason for your help the last couple of weeks.
    I tried setting cgwMergeMissingThreshold to 0, which did not solve the error.
    In addition to that I disabled the kickOutNonOvlContigs option, as suggested by Jason. This did the trick! The scaffolder finished without errors and the assembly is now finished.

    For the record: I used corrected PacBio reads (used pacBioToCA for correction) and Illumina HiSeq paired and unpaired datasets for assembly,