Menu

#302 PBcR self-corrected assembly

scaffolder
open
nobody
5
2015-04-16
2015-04-12
Mark
No

Hi all,

I have just used PBcR in wgs-8.3 to self-correct and assemble some PacBio reads.
The reads I assembled were 36 x coverage. The genome I am assembling is a haploid genome which is known to be approx 39 Mb and has around 7 % repetitive DNA.

The .spec file I used contained the following:

ovlHashBits = 25
ovlHashBlockLength = 180000000
ovlMemory = 20

The program ran successfully without producing any errors. The problem is that the assembly is approximately 9 Mb long and made up of approx 350 scaffolds - much smaller than the expected 39 Mb. I was just wondering if there are any parameters that can be altered to improve the assembly length?

Kind regards,
Mark Derbyshire

Related

Bugs: #302

Discussion

  • Sergey Koren

    Sergey Koren - 2015-04-13

    The most likely reason is that you ended up with too low coverage in the corrected sequences to get good assembly. Have you checked how much coverage is in the corrected sequences and what their average length is compared to your input sequences?

    There are a couple of things to check. First, in your temporary folder in runPartition, look for text that either says falcon_sense or pbdagcon. For the lower coverage you have you want to use pbdagcon so if runPartition.sh doesn’t say pbdagcon, you can make sure you have SMRTportal installed and in your path and re-run the pipeline specifying -sensitive on the command line for PBcR (see http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly). Make sure you also specify the genome size to PBcR on the command line. If that doesn’t improve your assembly or it was already using pbdagcon, the link I included has some guidance on adjusting parameters for low coverage datasets that you can try to see if your assembly improves.

    On Apr 12, 2015, at 6:08 AM, Mark markcharder@users.sf.net wrote:

    [bugs:#302] http://sourceforge.net/p/wgs-assembler/bugs/302 PBcR self-corrected assembly

    Status: open
    Group: scaffolder
    Labels: PBcR MHAP self-correct PacBio
    Created: Sun Apr 12, 2015 10:08 AM UTC by Mark
    Last Updated: Sun Apr 12, 2015 10:08 AM UTC
    Owner: nobody

    Hi all,

    I have just used PBcR in wgs-8.3 to self-correct and assemble some PacBio reads.
    The reads I assembled were 36 x coverage. The genome I am assembling is a haploid genome which is known to be approx 39 Mb and has around 7 % repetitive DNA.

    The .spec file I used contained the following:

    ovlHashBits = 25
    ovlHashBlockLength = 180000000
    ovlMemory = 20

    The program ran successfully without producing any errors. The problem is that the assembly is approximately 9 Mb long and made up of approx 350 scaffolds - much smaller than the expected 39 Mb. I was just wondering if there are any parameters that can be altered to improve the assembly length?

    Kind regards,
    Mark Derbyshire

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/wgs-assembler/bugs/302/ https://sourceforge.net/p/wgs-assembler/bugs/302
    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ https://sourceforge.net/auth/subscriptions

     

    Related

    Bugs: #302

  • Mark

    Mark - 2015-04-16

    Thank you very much for the advice. I am now running with altered parameters, i.e. specifying -sensitive and providing the genome size.
    However, this time the process has been running for around 60 hours, as compared to the 20 hours it took before.
    wgs has created a lot more files in the temporary folder, probably close to 200.

    Some examples include:
    1.err
    1.lay.err
    asm.100.log

    Is this correct? I am running on an 8 core node with 20 GB available RAM. Is this enough?
    Is it possible to run this without SMRT portal in my environment? It is not installed on the server I am running on. I have a lot of SMRT modules and have been able to locally install individual modules that are necessary for other things.

    Sorry for all the questions, I don't have a lot of expertise myself and am not in contact with anyone who uses this software extensively.

    Regards,
    Mark

     
    • Sergey Koren

      Sergey Koren - 2015-04-16

      It should create those files either with or without sensitive (the number is controlled by partitions in your command line). However, for sensitive it uses PBDAGCON instead of falcon_sense. PBDAGCON is significantly slower so it’s likely that falcon_sense ran so fast you didn’t notice the temporary files being created and removed. That would also account for the longer runtime as compared to previously.

      Serge

      On Apr 16, 2015, at 10:44 AM, Mark markcharder@users.sf.net wrote:

      Thank you very much for the advice. I am now running with altered parameters, i.e. specifying -sensitive and providing the genome size.
      However, this time the process has been running for around 60 hours, as compared to the 20 hours it took before.
      wgs has created a lot more files in the temporary folder, probably close to 200.

      Some examples include:
      1.err
      1.lay.err
      asm.100.log

      Is this correct? I am running on an 8 core node with 20 GB available RAM. Is this enough?
      Is it possible to run this without SMRT portal in my environment? It is not installed on the server I am running on. I have a lot of SMRT modules and have been able to locally install individual modules that are necessary for other things.

      Sorry for all the questions, I don't have a lot of expertise myself and am not in contact with anyone who uses this software extensively.

      Regards,
      Mark

      [bugs:#302] http://sourceforge.net/p/wgs-assembler/bugs/302 PBcR self-corrected assembly

      Status: open
      Group: scaffolder
      Labels: PBcR MHAP self-correct PacBio
      Created: Sun Apr 12, 2015 10:08 AM UTC by Mark
      Last Updated: Sun Apr 12, 2015 10:08 AM UTC
      Owner: nobody

      Hi all,

      I have just used PBcR in wgs-8.3 to self-correct and assemble some PacBio reads.
      The reads I assembled were 36 x coverage. The genome I am assembling is a haploid genome which is known to be approx 39 Mb and has around 7 % repetitive DNA.

      The .spec file I used contained the following:

      ovlHashBits = 25
      ovlHashBlockLength = 180000000
      ovlMemory = 20

      The program ran successfully without producing any errors. The problem is that the assembly is approximately 9 Mb long and made up of approx 350 scaffolds - much smaller than the expected 39 Mb. I was just wondering if there are any parameters that can be altered to improve the assembly length?

      Kind regards,
      Mark Derbyshire

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/wgs-assembler/bugs/302/ https://sourceforge.net/p/wgs-assembler/bugs/302
      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ https://sourceforge.net/auth/subscriptions

       

      Related

      Bugs: #302


Log in to post a comment.