summary: Tide allocates too much memory for patterns of modifications --> Tide memory management
Description has changed:
Diff:
--- old+++ new@@ -0,0 +1,2 @@+1. Tide-index allocates too much memory for patterns of modifications+2. Tide-search loads all the index file at the beginning. check why!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
--- old+++ new@@ -1,2 +1,3 @@1. Tide-index allocates too much memory for patterns of modifications
2. Tide-search loads all the index file at the beginning. check why!
+3. Tide-index uses to much memory to generate non-modified peptides. (see 148)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The problem is that, in Windows, you are limited to 4G of memory. We are not going to solve this without a complete redesign, so I am closing this issue.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am re-opening this ticket and temporarily assigning it to myself.
In response to a user's query, I did some memory profiling of tide-index today. What I found was disturbing: doing a tryptic digestion, every amino acid in the database requires 24 bytes of memory. This implies that if this user wants to search a 15 GB file, they need 300 GB of memory. Does anyone have any idea why we need so much space for tide-search? It seems like we should be able to do better than 24 bytes to store one amino acid.
Could it just be the overhead in the Peptide data structure? The Peptide class contains the following private members:
int len_;
double mass_;
int id_;
int first_loc_protein_id_;
int first_loc_pos_;
bool has_aux_locations_index_;
int aux_locations_index_;
const char residues_;
int num_mods_;
ModCoder::Mod mods_;
bool decoy_;
void prog1_;
void prog2_;
On a 64-bit platform most those members are going to be 8 bytes long, so each peptide record is going to be over 80 bytes, not including the residue string. If most peptides are relatively short, it seems like the size of the size of the peptide records is going to be dominated by the supporting fields.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Diff:
Diff:
The problem is that, in Windows, you are limited to 4G of memory. We are not going to solve this without a complete redesign, so I am closing this issue.
I am re-opening this ticket and temporarily assigning it to myself.
In response to a user's query, I did some memory profiling of tide-index today. What I found was disturbing: doing a tryptic digestion, every amino acid in the database requires 24 bytes of memory. This implies that if this user wants to search a 15 GB file, they need 300 GB of memory. Does anyone have any idea why we need so much space for tide-search? It seems like we should be able to do better than 24 bytes to store one amino acid.
Details are here (see today's entry):
http://noble.gs.washington.edu/~wnoble/proj/crux-projects/2010tinkering/results/results.html
See 24 July 2015 entry.
In response to this, Charles said:
Could it just be the overhead in the Peptide data structure? The Peptide class contains the following private members:
int len_;
double mass_;
int id_;
int first_loc_protein_id_;
int first_loc_pos_;
bool has_aux_locations_index_;
int aux_locations_index_;
const char residues_;
int num_mods_;
ModCoder::Mod mods_;
bool decoy_;
void prog1_;
void prog2_;
On a 64-bit platform most those members are going to be 8 bytes long, so each peptide record is going to be over 80 bytes, not including the residue string. If most peptides are relatively short, it seems like the size of the size of the peptide records is going to be dominated by the supporting fields.