Menu

#128 Tide memory management

post v2.0
open
crux-users (14)
2015-07-31
2014-04-11
No
  1. Tide-index allocates too much memory for patterns of modifications
  2. Tide-search loads all the index file at the beginning. check why!
  3. Tide-index uses to much memory to generate non-modified peptides. (see 148)

Discussion

  • Attila Kertesz-Farkas

    • summary: Tide allocates too much memory for patterns of modifications --> Tide memory management
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -0,0 +1,2 @@
    +1. Tide-index allocates too much memory for patterns of modifications
    +2. Tide-search loads all the index file at the beginning. check why!
    
     
  • William S Noble

    William S Noble - 2014-04-20
    • Milestone: post v2.0 --> Tide
     
  • Attila Kertesz-Farkas

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,2 +1,3 @@
    
     1. Tide-index allocates too much memory for patterns of modifications
     2. Tide-search loads all the index file at the beginning. check why!
    +3. Tide-index uses to much memory to generate non-modified peptides. (see 148)
    
     
  • William S Noble

    William S Noble - 2014-10-31

    The problem is that, in Windows, you are limited to 4G of memory. We are not going to solve this without a complete redesign, so I am closing this issue.

     
  • William S Noble

    William S Noble - 2014-10-31
    • status: open --> wont-fix
     
  • William S Noble

    William S Noble - 2015-07-31

    I am re-opening this ticket and temporarily assigning it to myself.

    In response to a user's query, I did some memory profiling of tide-index today. What I found was disturbing: doing a tryptic digestion, every amino acid in the database requires 24 bytes of memory. This implies that if this user wants to search a 15 GB file, they need 300 GB of memory. Does anyone have any idea why we need so much space for tide-search? It seems like we should be able to do better than 24 bytes to store one amino acid.

    Details are here (see today's entry):
    http://noble.gs.washington.edu/~wnoble/proj/crux-projects/2010tinkering/results/results.html

    See 24 July 2015 entry.

    In response to this, Charles said:

    Could it just be the overhead in the Peptide data structure? The Peptide class contains the following private members:

    int len_;
    double mass_;
    int id_;
    int first_loc_protein_id_;
    int first_loc_pos_;
    bool has_aux_locations_index_;
    int aux_locations_index_;
    const char residues_;
    int num_mods_;
    ModCoder::Mod
    mods_;
    bool decoy_;

    void prog1_;
    void
    prog2_;

    On a 64-bit platform most those members are going to be 8 bytes long, so each peptide record is going to be over 80 bytes, not including the residue string. If most peptides are relatively short, it seems like the size of the size of the peptide records is going to be dominated by the supporting fields.

     
  • William S Noble

    William S Noble - 2015-07-31
    • labels: --> crux-users
    • status: wont-fix --> open
    • assigned_to: Attila Kertesz-Farkas --> William S Noble
    • Milestone: Tide --> post v2.0
     

Log in to post a comment.

MongoDB Logo MongoDB