Menu

#6 seatac partitioning is broken

open
ATAC (6)
5
2015-04-17
2008-11-06
No

Currently, only one partition can be used, due to the removal of the old chainedSequence class. Seagen gets around this by partitioning on bases (with an overlap in the middle). Seatac cannot do this.

As a result, seatac needs LOTS of memory (> 16GB I'd guess) to run on mammals.

Discussion

  • Brian Walenz

    Brian Walenz - 2008-11-06
    • labels: 1162588 --> ATAC
    • summary: partitioning is broken --> seatac partitioning is broken
     
  • Dan Bolser

    Dan Bolser - 2015-04-16

    Is this still an issue? Can you explain the output of leaff --partitionmap and how it would be used?

     
  • Brian Walenz

    Brian Walenz - 2015-04-17

    I'm guessing this was an issue back 2008 when 16gb was "LOTS of memory". 'seagen' was a similar code uses for mapping ESTs that was able to break the reference genome into smaller chunks, and could run comfortable on 1gb machines (remember, 2008).

     
  • Brian Walenz

    Brian Walenz - 2015-04-17

    The partitionmap output....wow, that's quite obnoxious.

    With values as-text, and formatting in the file listed as is:

    number-of-partitions
    partition-number](partition-size) sequence-index(sequence-length) ...

    Nothing tells the number of sequences in a partition (argh). The partition-size is probably just the sum of the sequence-lengths. sequence-indices will start at 0 = the first sequence in the file.

    It's just a different output from the same partitioning algorithm; instead of rewriting the fasta file, it gave a list of indices.

     

Log in to post a comment.