metassembler / Discussion / General Discussion: extremely long runtime? configuration worng?

Astrid - 2017-05-17

Hi Michael,
I have a metassembly runt that takes already 4 weeks merging a Spades and a Celera fish genome assembly (1GB genome size maximum). Telling from you rpaper this should have been finishes long time ago,
Here is a copy of my spec file
[global]

Mate-pair mapping parameters:

bowtie2_threads=8
bowtie2_read1=all_1P.fastq
bowtie2_read2=all_2P.fastq
bowtie2_maxins=1000
bowtie2_minins=10
genomeLength=950000000
meta2fasta_keepUnaligned=3
meta2fasta_sizeUnaligned=350 350
nucmer_l=50
nucmer_c=300

CE-stat computation parameters:

mateAn_s=500
mateAn_m=350

[1]

fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/CeleraFemales/assembly/9-terminator/all_females.scf.fasta
ID=CeleraFemales

mateAn_file=

[2]

fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/SpadesFemale/spadesnewmemory/scaffolds.fasta
ID=SpadesFemales

mateAn_file=

I am running on 8 cores and 40GB RAM, any help would be great
All the best
Astrid

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Michael Schatz - 2017-05-17
  
  Can you tell what phase of the program is currently running? We
  successfully merged the fish genome from the Assemblathon 2 data set in ~1
  day. Here are the notes on it from the supplemental material:
  
  For all Fish assemblies and metassemblies we used the available 2Kb
  mate-pair libraries:
  801KYABXX.2 and 801KYABXX.3
  
  Mapping: bowtie2 --maxins 3000 --minins 1000 --threads 16
  CE-statistic: mateAn -A 1500 -B 2600
  WGA: nucmer –maxmatch -l 50 -c 300
  Merges: asseMerge with default options
  
  Runtime Requirements:
  
  Bowtie alignment: ~6.2 h
  CEstat computation: ~2.6 h
  Nucmer WGA: ~57 h
  asseMerge: ~45 min
  meta2fasta: ~70 s
  
  Peak RAM requirement: 36GB
  
  Depending on what step is running i can make some suggestions on what could
  be tuned
  
  Hope this helps
  
  Mike
  
  On Wed, May 17, 2017 at 7:06 AM, Astrid astridboehne@users.sf.net wrote:
  
  Hi Michael,
  I have a metassembly runt that takes already 4 weeks merging a Spades and
  a Celera fish genome assembly (1GB genome size maximum). Telling from you
  rpaper this should have been finishes long time ago,
  Here is a copy of my spec file
  [global]
  
  Mate-pair mapping parameters:
  
  bowtie2_threads=8
  bowtie2_read1=all_1P.fastq
  bowtie2_read2=all_2P.fastq
  bowtie2_maxins=1000
  bowtie2_minins=10
  genomeLength=950000000
  meta2fasta_keepUnaligned=3
  meta2fasta_sizeUnaligned=350 350
  nucmer_l=50
  nucmer_c=300
  
  CE-stat computation parameters:
  
  mateAn_s=500
  mateAn_m=350
  
  [1]
  
  fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
  CeleraFemales/assembly/9-terminator/all_females.scf.fasta
  ID=CeleraFemales
  
  mateAn_file=
  
  [2]
  
  fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
  SpadesFemale/spadesnewmemory/scaffolds.fasta
  ID=SpadesFemales
  
  mateAn_file=
  
  I am running on 8 cores and 40GB RAM, any help would be great
  All the best
  Astrid
  
  extremely long runtime? configuration worng?
  
  Sent from sourceforge.net because you indicated interest in <
  https://sourceforge.net/p/metassembler/discussion/general/>
  
  To unsubscribe from further messages, please visit <
  https://sourceforge.net/auth/subscriptions/>
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Astrid - 2017-05-17
    
    Hi Michael
    Yes that is what I saw in your paper and it is a species closely related
    to the one from the Assemblathon. I was guessing that the issue is
    nucmer? I realized that I am using a rather old version of Mummer, maybe
    that is the problem that it goes so slow?
    
    telling from the logs, it is stuck at this step (though doing something
    since the file QSpadesFemales.CeleraFemales.mgaps keeps changing)
    
    ---- Merging SpadesFemales and CeleraFemales ==>
    QSpadesFemales.CeleraFemales
    
    ---------- Run bash command ----------
    
    Create
    /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales:
    mkdir
    /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales
    ...
    
    ---------- Run bash command ----------
    
    nucmer
    /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/CeleraFemales/CeleraFemales.fa
    /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/SpadesFemales/SpadesFemales.fa:
    nucmer --maxmatch -l 50 -c 300 -p
    /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales/QSpadesFemales.CeleraFemales
    /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/CeleraFemales/CeleraFemales.fa
    /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/SpadesFemales/SpadesFemales.fa
    ...
    
    This is what goes to stderr
    Processed 56659 scaffolds and 117924 contigs, printed 113096 at least
    200 bp long
    Processed 908390 scaffolds and 913280 contigs, printed 404117 at least
    200 bp long
    1: PREPARING DATA
    2,3: RUNNING mummer AND CREATING CLUSTERS
    
    reading input file
    
    "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales/QSpadesFemales.CeleraFemales.ntref"
    of length 729766782
    
    construct suffix tree for sequence of length 729766782
    
    (maximum reference length is 2305843009213693948)
    
    (maximum query length is 18446744073709551615)
    
    process 7297667 characters per dot
    
    ....................................................................................................
    
    CONSTRUCTIONTIME
    
    /scicore/home/salzburg/boehne/applications/MUMmer3.23/mummer
    /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales/QSpadesFemales.CeleraFemales.ntref
    335.34
    
    reading input file
    
    "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/SpadesFemales/SpadesFemales.fa"
    of length 903891688
    
    matching query-file
    
    "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/SpadesFemales/SpadesFemales.fa"
    
    against subject-file
    
    "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales/QSpadesFemales.CeleraFemales.ntref"
    
    Thank you for your quick reply
    Astrid
    
    On 17.05.17 16:37, Michael Schatz wrote:
    
    Can you tell what phase of the program is currently running? We
    successfully merged the fish genome from the Assemblathon 2 data set in ~1
    day. Here are the notes on it from the supplemental material:
    
    For all Fish assemblies and metassemblies we used the available 2Kb
    mate-pair libraries:
    801KYABXX.2 and 801KYABXX.3
    
    Mapping: bowtie2 --maxins 3000 --minins 1000 --threads 16
    CE-statistic: mateAn -A 1500 -B 2600
    WGA: nucmer –maxmatch -l 50 -c 300
    Merges: asseMerge with default options
    
    Runtime Requirements:
    
    Bowtie alignment: ~6.2 h
    CEstat computation: ~2.6 h
    Nucmer WGA: ~57 h
    asseMerge: ~45 min
    meta2fasta: ~70 s
    
    Peak RAM requirement: 36GB
    
    Depending on what step is running i can make some suggestions on what
    could
    be tuned
    
    Hope this helps
    
    Mike
    
    On Wed, May 17, 2017 at 7:06 AM, Astrid astridboehne@users.sf.net
    astridboehne@users.sf.net wrote:
    
    Hi Michael, I have a metassembly runt that takes already 4 weeks merging a Spades and a Celera fish genome assembly (1GB genome size maximum). Telling from you rpaper this should have been finishes long time ago, Here is a copy of my spec file [global] Mate-pair mapping parameters: bowtie2_threads=8 bowtie2_read1=all_1P.fastq bowtie2_read2=all_2P.fastq bowtie2_maxins=1000 bowtie2_minins=10 genomeLength=950000000 meta2fasta_keepUnaligned=3 meta2fasta_sizeUnaligned=350 350 nucmer_l=50 nucmer_c=300 CE-stat computation parameters: mateAn_s=500 mateAn_m=350 [1] fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/ CeleraFemales/assembly/9-terminator/all_females.scf.fasta ID=CeleraFemales mateAn_file= [2] fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/ SpadesFemale/spadesnewmemory/scaffolds.fasta ID=SpadesFemales mateAn_file= I am running on 8 cores and 40GB RAM, any help would be great All the best Astrid ------------------------------------------------------------------------ extremely long runtime? configuration worng? <https://sourceforge.%0Anet/p/metassembler/discussion/general/thread/20e2dac4/?limit=25#3afa> ------------------------------------------------------------------------ Sent from sourceforge.net because you indicated interest in < https://sourceforge.net/p/metassembler/discussion/general/> To unsubscribe from further messages, please visit < https://sourceforge.net/auth/subscriptions/>
    
    extremely long runtime? configuration worng?
    https://sourceforge.net/p/metassembler/discussion/general/thread/20e2dac4/?limit=25#3afa/e065
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/metassembler/discussion/general/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    --
    Astrid Böhne
    Universität Basel
    Zoologisches Institut
    Evolutionsbiologie
    Vesalgasse 1
    CH-4051 Basel
    Switzerland
    Phone +41 (0)61 207 03 05
    Fax +41 (0) 61 207 03 01
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Michael Schatz - 2017-05-17
      
      Yeah, there must be tons of repeats if it is still stuck in nucmer. As
      painful as it is, Id kill the job and start again with different nucmer
      settings. I would recommend: -l 100 -c 500
      
      This will (modestly) reduce sensitivity, but could finish in less than a
      day. If it takes more than a day, boost up -l 100 to -l 250 and try again
      
      Good luck!
      
      Mike
      
      On Wed, May 17, 2017 at 11:02 AM, Astrid astridboehne@users.sf.net wrote:
      
      Hi Michael
      Yes that is what I saw in your paper and it is a species closely related
      to the one from the Assemblathon. I was guessing that the issue is
      nucmer? I realized that I am using a rather old version of Mummer, maybe
      that is the problem that it goes so slow?
      
      telling from the logs, it is stuck at this step (though doing something
      since the file QSpadesFemales.CeleraFemales.mgaps keeps changing)
      
      ---- Merging SpadesFemales and CeleraFemales ==>
      QSpadesFemales.CeleraFemales
      
      ---------- Run bash command ----------
      
      Create
      /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales:
      mkdir
      /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      Metassemble_Female/Metassembly/QSpadesFemales.CeleraFemales
      ...
      
      ---------- Run bash command ----------
      
      nucmer
      /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      Metassemble_Female/CeleraFemales/CeleraFemales.fa
      /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      Metassemble_Female/SpadesFemales/SpadesFemales.fa:
      nucmer --maxmatch -l 50 -c 300 -p
      /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      Metassemble_Female/Metassembly/QSpadesFemales.
      CeleraFemales/QSpadesFemales.CeleraFemales
      /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      Metassemble_Female/CeleraFemales/CeleraFemales.fa
      /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      Metassemble_Female/SpadesFemales/SpadesFemales.fa
      ...
      
      This is what goes to stderr
      Processed 56659 scaffolds and 117924 contigs, printed 113096 at least
      200 bp long
      Processed 908390 scaffolds and 913280 contigs, printed 404117 at least
      200 bp long
      1: PREPARING DATA
      2,3: RUNNING mummer AND CREATING CLUSTERS
      
      reading input file
      
      "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_
      Female/Metassembly/QSpadesFemales.CeleraFemales/
      QSpadesFemales.CeleraFemales.ntref"
      of length 729766782
      
      construct suffix tree for sequence of length 729766782
      
      (maximum reference length is 2305843009213693948)
      
      (maximum query length is 18446744073709551615)
      
      process 7297667 characters per dot
      
      ...........................................................
      
      .........................................
      
      CONSTRUCTIONTIME
      
      /scicore/home/salzburg/boehne/applications/MUMmer3.23/mummer
      /scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/
      Metassemble_Female/Metassembly/QSpadesFemales.
      CeleraFemales/QSpadesFemales.CeleraFemales.ntref
      335.34
      
      reading input file
      
      "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_
      Female/SpadesFemales/SpadesFemales.fa"
      of length 903891688
      
      matching query-file
      
      "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_
      Female/SpadesFemales/SpadesFemales.fa"
      
      against subject-file
      
      "/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/Metassemble_
      Female/Metassembly/QSpadesFemales.CeleraFemales/
      QSpadesFemales.CeleraFemales.ntref"
      
      Thank you for your quick reply
      Astrid
      
      On 17.05.17 16:37, Michael Schatz wrote:
      
      Can you tell what phase of the program is currently running? We
      successfully merged the fish genome from the Assemblathon 2 data set in
      ~1
      day. Here are the notes on it from the supplemental material:
      
      For all Fish assemblies and metassemblies we used the available 2Kb
      mate-pair libraries:
      801KYABXX.2 and 801KYABXX.3
      
      Mapping: bowtie2 --maxins 3000 --minins 1000 --threads 16
      CE-statistic: mateAn -A 1500 -B 2600
      WGA: nucmer –maxmatch -l 50 -c 300
      Merges: asseMerge with default options
      
      Runtime Requirements:
      
      Bowtie alignment: ~6.2 h
      CEstat computation: ~2.6 h
      Nucmer WGA: ~57 h
      asseMerge: ~45 min
      meta2fasta: ~70 s
      
      Peak RAM requirement: 36GB
      
      Depending on what step is running i can make some suggestions on what
      could
      be tuned
      
      Hope this helps
      
      Mike
      
      On Wed, May 17, 2017 at 7:06 AM, Astrid astridboehne@users.sf.net
      astridboehne@users.sf.net wrote:
      
      Hi Michael, I have a metassembly runt that takes already 4 weeks merging a Spades and a Celera fish genome assembly (1GB genome size maximum). Telling from you rpaper this should have been finishes long time ago, Here is a copy of my spec file [global] Mate-pair mapping parameters: bowtie2_threads=8 bowtie2_read1=all_1P.fastq bowtie2_read2=all_2P.fastq bowtie2_maxins=1000 bowtie2_minins=10 genomeLength=950000000 meta2fasta_keepUnaligned=3 meta2fasta_sizeUnaligned=350 350 nucmer_l=50 nucmer_c=300 CE-stat computation parameters: mateAn_s=500 mateAn_m=350 [1] fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/ CeleraFemales/assembly/9-terminator/all_females.scf.fasta ID=CeleraFemales mateAn_file= [2] fasta=/scicore/home/salzburg/boehne/Ambizione/Pseudocrenilabrus/ SpadesFemale/spadesnewmemory/scaffolds.fasta ID=SpadesFemales mateAn_file= I am running on 8 cores and 40GB RAM, any help would be great All the best Astrid ------------------------------------------------------------
      
      extremely long runtime? configuration worng? <https://sourceforge.%0Anet/p/metassembler/discussion/
      
      general/thread/20e2dac4/?limit=25#3afa>
      
      ------------------------------------------------------------
      
      Sent from sourceforge.net because you indicated interest in < https://sourceforge.net/p/metassembler/discussion/general/> To unsubscribe from further messages, please visit < https://sourceforge.net/auth/subscriptions/>
      
      extremely long runtime? configuration worng?
      https://sourceforge.net/p/metassembler/discussion/ general/thread/20e2dac4/?limit=25#3afa/e065
      
      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/metassembler/discussion/general/
      
      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      
      --
      Astrid Böhne
      Universität Basel
      Zoologisches Institut
      Evolutionsbiologie
      Vesalgasse 1
      CH-4051 Basel
      Switzerland
      Phone +41 (0)61 207 03 05
      Fax +41 (0) 61 207 03 01
      
      extremely long runtime? configuration worng?
      
      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/metassembler/discussion/general/>
      
      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

extremely long runtime? configuration worng?

Forums

Help

extremely long runtime? configuration worng?

Mate-pair mapping parameters:

CE-stat computation parameters:

mateAn_file=

mateAn_file=

Mate-pair mapping parameters:

CE-stat computation parameters:

mateAn_file=

mateAn_file=

reading input file

construct suffix tree for sequence of length 729766782

(maximum reference length is 2305843009213693948)

(maximum query length is 18446744073709551615)

process 7297667 characters per dot

....................................................................................................

CONSTRUCTIONTIME

reading input file

matching query-file

against subject-file

reading input file

construct suffix tree for sequence of length 729766782

(maximum reference length is 2305843009213693948)

(maximum query length is 18446744073709551615)

process 7297667 characters per dot

...........................................................

CONSTRUCTIONTIME

reading input file

matching query-file

against subject-file

extremely long runtime? configuration worng?

Forums

Help

extremely long runtime? configuration worng? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Mate-pair mapping parameters:

CE-stat computation parameters:

mateAn_file=

mateAn_file=

Mate-pair mapping parameters:

CE-stat computation parameters:

mateAn_file=

mateAn_file=

reading input file

construct suffix tree for sequence of length 729766782

(maximum reference length is 2305843009213693948)

(maximum query length is 18446744073709551615)

process 7297667 characters per dot

....................................................................................................

CONSTRUCTIONTIME

reading input file

matching query-file

against subject-file

reading input file

construct suffix tree for sequence of length 729766782

(maximum reference length is 2305843009213693948)

(maximum query length is 18446744073709551615)

process 7297667 characters per dot

...........................................................

CONSTRUCTIONTIME

reading input file

matching query-file

against subject-file

extremely long runtime? configuration worng?