using too much memory when loading vector

2007-06-21
2013-04-25
  • Hi Roman,

    I am having a problem when reading from a text file and inserting the elements into a vector. When loading the data, too much of main memory is used, and eventually this operation takes must of it, leaving me with very little to work with (e.g. for a later sort the elements, and other operations).

    Here is the part of my code where I load the data into the vector, and consumes too much memory.

    // Element to be loaded into the vector
    class CSortNode{   
    public:           
    double m_pDims[MAX_DIMENSIONALITY]; //coordinates/dimensions
    double m_mindist;    

    CSortNode(){ m_mindist  = 0.0; };
    CSortNode(double mindist_): m_mindist(mindist_){};
       
    void setMinDist(short d){
       m_mindist = 0.0;
       for (short i = 0; i < d; i++)
         m_mindist+=m_pDims[i];
       }   
           
    CSortNode operator=(const CSortNode p){    
      memcpy(&m_pDims, &p.m_pDims, (MAX_DIMENSIONALITY)*sizeof(double));
      m_mindist = p.m_mindist;    
      return (*this); 
    }     

    friend std::istream& operator >>(std::istream &is, CSortNode &obj){
         for (short i = 0; i < MAX_DIMENSIONALITY; i++){
            is >> obj.m_pDims[i];               
         }       
         is.ignore(512, '\n'); // ignore the rest of the values
         return is;
    }                           
    }; // end class

    ...
    // vector of "CSortNode" elements
    typedef stxxl::VECTOR_GENERATOR<CSortNode, PgSz_, Pages_, BlkSize_>::result arrayInputType;       arrayInputType inputData; // array of d-dimensional points           
       
    // open text file
    std::fstream in(fname.c_str(), std::ios::in);

    // load vector
    std::copy(std::istream_iterator<CSortNode>(in),   
        std::istream_iterator<CSortNode>(),   
        std::back_inserter(inputData));

    This last part seems to be taking a lot of main memory, I was able to run my code up to 10,000,000 of CSortNodes elements (6-dimensional points), before it ran out of memory after loading the data from file and could not do a sort on the loaded points (a seg fault error).

    I also tried this, but still the same problem:

    std::fstream in(fname.c_str(), std::ios::in);

    inputData.resize(totalPoints);
    while (!in.eof()) {
       in >> inputData[i];
       in.flush();
       i++;
    }   
    in.close();

    I would appreciate your help to figure out where the memory problem might be.

    thanks

    --Adan

     
    • Hi Adan,

      Could you please find out if this segmentation fault is really due to the memory limit? How the error message looks like? On a Linux system you can also check the system message log with command 'dmesg'

      Stxxl vector should occupy about PgSz_*Pages_*BlkSize_ bytes in the internal memory. However the std::fstream and the operating system might cache the whole input file when reading. It could be that this cached data is not discarded (a bug?) from the main memory when Stxxl tries to allocate the memory for sorting. That is just a guess, first you should make sure that you program aborts due to memory problems.

      With best regards,
      Roman

       
    • Hi,

      it seems that what you say is true and std::fstream and the operating system are caching the input file; even though I explicitly say .close(), it does not free that, and I don't have enough memory for later operations. Just for the record I turned the swap off, to make sure I don't use any virtual memory.

      This is the message I get, when working with 15,000,000 elements of type CSortNode:

      [STXXL-MSG] Disk '/var/tmp/stxxl' is allocated, space: 20000 Mb, I/O implementation: syscall

      Time to read the input data: 140.7 secs
      =============== FOR QUERY_DIMENSIONALITY: 6 =====================
      Time to calculate the mindist of the point set: 42.78 secs
      terminate called after throwing an instance of 'std::bad_alloc'
        what():  St9bad_alloc
      Aborted

      any ideas as how to get the memory back?

      thanks a lot

      --Adan

       
      • Hi,

        >it seems that what you say is true and std::fstream and the operating system are caching the >input file; even though I explicitly say .close(), it does not free that, and I don't have enough >memory for later operations. Just for the record I turned the swap off, to make sure I don't use >any virtual memory.

        All this looks like a bug in your Linux (update kernel?). The clean caches must be dropped and reused by an application that allocates the memory.

        Switching the swap file does not help. One should prevent the caching of the file. One can do this if you use direct open/write/read methods to access the input file (man 2 open). The file must be open with O_DIRECT flag. The disadvantage of this method is that your buffers must be 4KByte aligned, their size must be multiples of 4KByte, file offsets must be multiples of 4KByte as well. Also you need to parse the text in your file yourself.

        The alternative to this is to drop all clean caches of OS (requires root privileges).

        #include <stdlib.h>

        ...

        in.close();
        system("echo 1 > /proc/sys/vm/drop_caches"); // drop clean caches

        Best,
        Roman