Menu

extremely long runtime? configuration worng?

Astrid
2017-05-17
2017-05-17
  • Astrid

    Astrid - 2017-05-17

    Hi Michael,
    I have a metassembly runt that takes already 4 weeks merging a Spades and a Celera fish genome assembly (1GB genome size maximum). Telling from you rpaper this should have been finishes long time ago,
    Here is a copy of my spec file
    [global]

    Mate-pair mapping parameters:

    bowtie2_threads=8
    bowtie2_read1=all_1P.fastq
    bowtie2_read2=all_2P.fastq
    bowtie2_maxins=1000
    bowtie2_minins=10
    genomeLength=950000000
    meta2fasta_keepUnaligned=3
    meta2fasta_sizeUnaligned=350 350
    nucmer_l=50
    nucmer_c=300

    CE-stat computation parameters:

    mateAn_s=500
    mateAn_m=350

    [1]

    fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/CeleraFemales/assembly/9-terminator/all_females.scf.fasta
    ID=CeleraFemales

    mateAn_file=

    [2]

    fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/SpadesFemale/spadesnewmemory/scaffolds.fasta
    ID=SpadesFemales

    mateAn_file=

    I am running on 8 cores and 40GB RAM, any help would be great
    All the best
    Astrid

     
    • Michael Schatz

      Michael Schatz - 2017-05-17

      Can you tell what phase of the program is currently running? We
      successfully merged the fish genome from the Assemblathon 2 data set in ~1
      day. Here are the notes on it from the supplemental material:

      For all Fish assemblies and metassemblies we used the available 2Kb
      mate-pair libraries:
      801KYABXX.2 and 801KYABXX.3

      Mapping: bowtie2 --maxins 3000 --minins 1000 --threads 16
      CE-statistic: mateAn -A 1500 -B 2600
      WGA: nucmer –maxmatch -l 50 -c 300
      Merges: asseMerge with default options

      Runtime Requirements:

      Bowtie alignment: ~6.2 h
      CEstat computation: ~2.6 h
      Nucmer WGA: ~57 h
      asseMerge: ~45 min
      meta2fasta: ~70 s

      Peak RAM requirement: 36GB

      Depending on what step is running i can make some suggestions on what could
      be tuned

      Hope this helps

      Mike

      On Wed, May 17, 2017 at 7:06 AM, Astrid astridboehne@users.sf.net wrote:

      Hi Michael,
      I have a metassembly runt that takes already 4 weeks merging a Spades and
      a Celera fish genome assembly (1GB genome size maximum). Telling from you
      rpaper this should have been finishes long time ago,
      Here is a copy of my spec file
      [global]

      Mate-pair mapping parameters:

      bowtie2_threads=8
      bowtie2_read1=all_1P.fastq
      bowtie2_read2=all_2P.fastq
      bowtie2_maxins=1000
      bowtie2_minins=10
      genomeLength=950000000
      meta2fasta_keepUnaligned=3
      meta2fasta_sizeUnaligned=350 350
      nucmer_l=50
      nucmer_c=300

      CE-stat computation parameters:

      mateAn_s=500
      mateAn_m=350

      [1]

      fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      CeleraFemales/assembly/9-terminator/all_females.scf.fasta
      ID=CeleraFemales

      mateAn_file=

      [2]

      fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      SpadesFemale/spadesnewmemory/scaffolds.fasta
      ID=SpadesFemales

      mateAn_file=

      I am running on 8 cores and 40GB RAM, any help would be great
      All the best
      Astrid


      extremely long runtime? configuration worng?


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/metassembler/discussion/general/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
      • Astrid

        Astrid - 2017-05-17

        Hi Michael
        Yes that is what I saw in your paper and it is a species closely related
        to the one from the Assemblathon. I was guessing that the issue is
        nucmer? I realized that I am using a rather old version of Mummer, maybe
        that is the problem that it goes so slow?

        telling from the logs, it is stuck at this step (though doing something
        since the file QSpadesFemales.CeleraFemales.mgaps keeps changing)

        ---- Merging SpadesFemales and CeleraFemales ==>
        QSpadesFemales.CeleraFemales

        ---------- Run bash command ----------

        Create
        /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales:
        mkdir
        /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales
        ...


        ---------- Run bash command ----------

        nucmer
        /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/CeleraFemales/CeleraFemales.fa
        /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/SpadesFemales/SpadesFemales.fa:
        nucmer --maxmatch -l 50 -c 300 -p
        /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales/QSpadesFemales.CeleraFemales
        /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/CeleraFemales/CeleraFemales.fa
        /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/SpadesFemales/SpadesFemales.fa
        ...

        This is what goes to stderr
        Processed 56659 scaffolds and 117924 contigs, printed 113096 at least
        200 bp long
        Processed 908390 scaffolds and 913280 contigs, printed 404117 at least
        200 bp long
        1: PREPARING DATA
        2,3: RUNNING mummer AND CREATING CLUSTERS

        reading input file

        "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales/QSpadesFemales.CeleraFemales.ntref"
        of length 729766782

        construct suffix tree for sequence of length 729766782

        (maximum reference length is 2305843009213693948)

        (maximum query length is 18446744073709551615)

        process 7297667 characters per dot

        ....................................................................................................

        CONSTRUCTIONTIME

        /scicore/home/salzburg/boehne/applications/MUMmer3.23/mummer
        /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales/QSpadesFemales.CeleraFemales.ntref
        335.34

        reading input file

        "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/SpadesFemales/SpadesFemales.fa"
        of length 903891688

        matching query-file

        "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/SpadesFemales/SpadesFemales.fa"

        against subject-file

        "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales/QSpadesFemales.CeleraFemales.ntref"

        Thank you for your quick reply
        Astrid

        On 17.05.17 16:37, Michael Schatz wrote:

        Can you tell what phase of the program is currently running? We
        successfully merged the fish genome from the Assemblathon 2 data set in ~1
        day. Here are the notes on it from the supplemental material:

        For all Fish assemblies and metassemblies we used the available 2Kb
        mate-pair libraries:
        801KYABXX.2 and 801KYABXX.3

        Mapping: bowtie2 --maxins 3000 --minins 1000 --threads 16
        CE-statistic: mateAn -A 1500 -B 2600
        WGA: nucmer –maxmatch -l 50 -c 300
        Merges: asseMerge with default options

        Runtime Requirements:

        Bowtie alignment: ~6.2 h
        CEstat computation: ~2.6 h
        Nucmer WGA: ~57 h
        asseMerge: ~45 min
        meta2fasta: ~70 s

        Peak RAM requirement: 36GB

        Depending on what step is running i can make some suggestions on what
        could
        be tuned

        Hope this helps

        Mike

        On Wed, May 17, 2017 at 7:06 AM, Astrid astridboehne@users.sf.net
        astridboehne@users.sf.net wrote:

        Hi Michael,
        I have a metassembly runt that takes already 4 weeks merging a
        Spades and
        a Celera fish genome assembly (1GB genome size maximum). Telling
        from you
        rpaper this should have been finishes long time ago,
        Here is a copy of my spec file
        [global]
        
          Mate-pair mapping parameters:
        
        bowtie2_threads=8
        bowtie2_read1=all_1P.fastq
        bowtie2_read2=all_2P.fastq
        bowtie2_maxins=1000
        bowtie2_minins=10
        genomeLength=950000000
        meta2fasta_keepUnaligned=3
        meta2fasta_sizeUnaligned=350 350
        nucmer_l=50
        nucmer_c=300
        
          CE-stat computation parameters:
        
        mateAn_s=500
        mateAn_m=350
        
        [1]
        
        fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
        CeleraFemales/assembly/9-terminator/all_females.scf.fasta
        ID=CeleraFemales
        
          mateAn_file=
        
        [2]
        
        fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
        SpadesFemale/spadesnewmemory/scaffolds.fasta
        ID=SpadesFemales
        
          mateAn_file=
        
        I am running on 8 cores and 40GB RAM, any help would be great
        All the best
        Astrid
        
        ------------------------------------------------------------------------
        
        extremely long runtime? configuration worng?
        <https://sourceforge.%0Anet/p/metassembler/discussion/general/thread/20e2dac4/?limit=25#3afa>
        
        ------------------------------------------------------------------------
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/metassembler/discussion/general/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        

        extremely long runtime? configuration worng?
        https://sourceforge.net/p/metassembler/discussion/general/thread/20e2dac4/?limit=25#3afa/e065


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/metassembler/discussion/general/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

        --
        Astrid Böhne
        Universität Basel
        Zoologisches Institut
        Evolutionsbiologie
        Vesalgasse 1
        CH-4051 Basel
        Switzerland
        Phone +41 (0)61 207 03 05
        Fax +41 (0) 61 207 03 01

         
        • Michael Schatz

          Michael Schatz - 2017-05-17

          Yeah, there must be tons of repeats if it is still stuck in nucmer. As
          painful as it is, Id kill the job and start again with different nucmer
          settings. I would recommend: -l 100 -c 500

          This will (modestly) reduce sensitivity, but could finish in less than a
          day. If it takes more than a day, boost up -l 100 to -l 250 and try again

          Good luck!

          Mike

          On Wed, May 17, 2017 at 11:02 AM, Astrid astridboehne@users.sf.net wrote:

          Hi Michael
          Yes that is what I saw in your paper and it is a species closely related
          to the one from the Assemblathon. I was guessing that the issue is
          nucmer? I realized that I am using a rather old version of Mummer, maybe
          that is the problem that it goes so slow?

          telling from the logs, it is stuck at this step (though doing something
          since the file QSpadesFemales.CeleraFemales.mgaps keeps changing)

          ---- Merging SpadesFemales and CeleraFemales ==>
          QSpadesFemales.CeleraFemales

          ---------- Run bash command ----------

          Create
          /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales:
          mkdir
          /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales
          ...


          ---------- Run bash command ----------

          nucmer
          /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          Metassemble_Female/CeleraFemales/CeleraFemales.fa
          /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          Metassemble_Female/SpadesFemales/SpadesFemales.fa:
          nucmer --maxmatch -l 50 -c 300 -p
          /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          Metassemble_Female/Metassembly/QSpadesFemales.
          CeleraFemales/QSpadesFemales.CeleraFemales
          /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          Metassemble_Female/CeleraFemales/CeleraFemales.fa
          /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          Metassemble_Female/SpadesFemales/SpadesFemales.fa
          ...

          This is what goes to stderr
          Processed 56659 scaffolds and 117924 contigs, printed 113096 at least
          200 bp long
          Processed 908390 scaffolds and 913280 contigs, printed 404117 at least
          200 bp long
          1: PREPARING DATA
          2,3: RUNNING mummer AND CREATING CLUSTERS

          reading input file

          "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_
          Female/Metassembly/QSpadesFemales.CeleraFemales/
          QSpadesFemales.CeleraFemales.ntref"
          of length 729766782

          construct suffix tree for sequence of length 729766782

          (maximum reference length is 2305843009213693948)

          (maximum query length is 18446744073709551615)

          process 7297667 characters per dot

          ...........................................................

          .........................................

          CONSTRUCTIONTIME

          /scicore/home/salzburg/boehne/applications/MUMmer3.23/mummer
          /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          Metassemble_Female/Metassembly/QSpadesFemales.
          CeleraFemales/QSpadesFemales.CeleraFemales.ntref
          335.34

          reading input file

          "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_
          Female/SpadesFemales/SpadesFemales.fa"
          of length 903891688

          matching query-file

          "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_
          Female/SpadesFemales/SpadesFemales.fa"

          against subject-file

          "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_
          Female/Metassembly/QSpadesFemales.CeleraFemales/
          QSpadesFemales.CeleraFemales.ntref"

          Thank you for your quick reply
          Astrid

          On 17.05.17 16:37, Michael Schatz wrote:

          Can you tell what phase of the program is currently running? We
          successfully merged the fish genome from the Assemblathon 2 data set in
          ~1
          day. Here are the notes on it from the supplemental material:

          For all Fish assemblies and metassemblies we used the available 2Kb
          mate-pair libraries:
          801KYABXX.2 and 801KYABXX.3

          Mapping: bowtie2 --maxins 3000 --minins 1000 --threads 16
          CE-statistic: mateAn -A 1500 -B 2600
          WGA: nucmer –maxmatch -l 50 -c 300
          Merges: asseMerge with default options

          Runtime Requirements:

          Bowtie alignment: ~6.2 h
          CEstat computation: ~2.6 h
          Nucmer WGA: ~57 h
          asseMerge: ~45 min
          meta2fasta: ~70 s

          Peak RAM requirement: 36GB

          Depending on what step is running i can make some suggestions on what
          could
          be tuned

          Hope this helps

          Mike

          On Wed, May 17, 2017 at 7:06 AM, Astrid astridboehne@users.sf.net
          astridboehne@users.sf.net wrote:

          Hi Michael,
          I have a metassembly runt that takes already 4 weeks merging a
          Spades and
          a Celera fish genome assembly (1GB genome size maximum). Telling
          from you
          rpaper this should have been finishes long time ago,
          Here is a copy of my spec file
          [global]
          
            Mate-pair mapping parameters:
          
          bowtie2_threads=8
          bowtie2_read1=all_1P.fastq
          bowtie2_read2=all_2P.fastq
          bowtie2_maxins=1000
          bowtie2_minins=10
          genomeLength=950000000
          meta2fasta_keepUnaligned=3
          meta2fasta_sizeUnaligned=350 350
          nucmer_l=50
          nucmer_c=300
          
            CE-stat computation parameters:
          
          mateAn_s=500
          mateAn_m=350
          
          [1]
          
          fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          CeleraFemales/assembly/9-terminator/all_females.scf.fasta
          ID=CeleraFemales
          
            mateAn_file=
          
          [2]
          
          fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
          SpadesFemale/spadesnewmemory/scaffolds.fasta
          ID=SpadesFemales
          
            mateAn_file=
          
          I am running on 8 cores and 40GB RAM, any help would be great
          All the best
          Astrid
          
          ------------------------------------------------------------
          

          extremely long runtime? configuration worng?
          <https://sourceforge.%0Anet/p/metassembler/discussion/
          

          general/thread/20e2dac4/?limit=25#3afa>

          ------------------------------------------------------------
          

          Sent from sourceforge.net because you indicated interest in <
          https://sourceforge.net/p/metassembler/discussion/general/>
          
          To unsubscribe from further messages, please visit <
          https://sourceforge.net/auth/subscriptions/>
          

          extremely long runtime? configuration worng?
          https://sourceforge.net/p/metassembler/discussion/ general/thread/20e2dac4/?limit=25#3afa/e065


          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/metassembler/discussion/general/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/

          --
          Astrid Böhne
          Universität Basel
          Zoologisches Institut
          Evolutionsbiologie
          Vesalgasse 1
          CH-4051 Basel
          Switzerland
          Phone +41 (0)61 207 03 05
          Fax +41 (0) 61 207 03 01


          extremely long runtime? configuration worng?


          Sent from sourceforge.net because you indicated interest in <
          https://sourceforge.net/p/metassembler/discussion/general/>

          To unsubscribe from further messages, please visit <
          https://sourceforge.net/auth/subscriptions/>

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.