more disks and less space

2008-10-22
2013-04-25
  • Hi!

    I use stxxl v1.0 on linux system.

    Recently a computation failed with the following error message

    [STXXL-ERRMSG] External memory block allocation error: 119537664 bytes requested, 14680064 bytes free. Trying to extend the external memory space...
    terminate called after throwing an instance of 'stxxl::bad_ext_alloc'
      what():  Error in DiskAllocator::check_corruption Error: double deallocation of external memory System info: P 27980201984 27980201984 27980201984

    the stxxl-file looked like this:
    > disk=/local/u01/data,190000,syscall
    > disk=/local/u02/data,190000,syscall

    I added another disk,
    > disk=/local/u01/data,190000,syscall
    > disk=/local/u02/data,190000,syscall
    > disk=/local/home/lkunert/data,50000,syscall

    and redid the calculation.
    This time the computation stopped much earlier, but basically with the same error:

    [STXXL-ERRMSG] External memory block allocation error: 81788928 bytes requested, 0 bytes free. Trying to extend the
    external memory space...
    terminate called after throwing an instance of 'stxxl::bad_ext_alloc'
      what():  Error in DiskAllocator::check_corruption Error: double deallocation of external memory System info: P 038356910080 15669919744

    please note that I use a pretty old version of stxxl, is the problem fided in the recent version?

    Thanks, Lars

     
    • Hi Lars,

      I strongly recommend you to try the most recent release, since many changes have been done in block manager (related to your error) since version 1.0

      Best regards,
      Roman

       
    • Hi!

      I ugraded to version 1.2.1 and redid my calculation.

      I started using 3 discs:

      > disk=/local/u01/data,190000,syscall
      > disk=/local/u02/data,190000,syscall
      > disk=/local/home/lkunert/data,500,syscall

      (The first two can't grow, the third one can... up to 90G)

      The third disk started to grow soon after the requested storage reached: 979.355.040B

      (I calc the requested storage by using the capacity method of the stxxl vector times the sizeof-value of each element)

      > [STXXL-ERRMSG] External memory block allocation error: 2097152 bytes requested, 0 bytes free. Trying to extend the external memory space...

      When I stopped the program disk number 3 was grown
      to 1.5 G

      I thougth that maybe stxxl needs disks of even size, so i removed the third disk and redid the calculation.

      This time I got a lot further, about 10h after the requested storage reached: 111.919.100.160B the program run out of memory.

      I assume that at that point, about 200G of external memory were in use.

      My current implementation, uses one big vector,
      most of the time this vector is modified in place,
      sometimes additional ellements are added at the end.
      From time to time I use stxxl to sort the complete vector.

      Is it possible, that stxxl does not distribute the vector over multiple disk?
      This would explain why the program faild early the first time - the vector was assigned to the third disk...,
      and wy the program worked a lot longer when no small disk was around.

      Thanks, Lars

       
      • Hi Lars,

        Stxxl vector allocates data _evenly_ over all available disks, without looking at their actual sizes. Therefore, it failed when you used one small disk.
        However, you can modify this behavior providing/implementing your own allocation policy:
        http://algo2.iti.uni-karlsruhe.de/dementiev/stxxl/tags/1.2.1/structVECTOR__GENERATOR.html (the AllocStr_ template parameter)

        I also recommend that the page size (PgSz_) matches the number of your disks.

        Best regards,
        Roman

         
    • Hi Roman,

      this was my first guess, and probably the reason why the second attempt (without the small disk) worked better.
      But this is no explaination for the failure of the second attempt: There I used two disks with 190G each and the program failed when about 200G were allocated.

      Is it possible that the way how I calculate the memory usage is wrong? (capacity times sizeof)? Do you have an "internal" function which returns the overall allocated external memory?

       
      • Hi Lars,

        The behavior you describe is strange, it could be a bug. Is the error message with stxxl 1.2.1 still "double dealocation of external memory"? Can you provide the core file (you need to execute the 'ulimit -c unlimited' to enable the producing of the core files)? If the file is large you can put it on one of the mpi-sb.mpg.de computers, I still have an account there ;-)

        The overall allocated external memory can be also tracked using the linux "ls -l your_stxxl_files" command. Your requested size estimation might be somewhat wrong if sizeof(T) does not divide the block size in bytes. In that case Stxxl adds some dummy info fields. The exact allocated number of bytes for a particular vector can be calculated as using the raw_capacity function of the vector (just added to the trunk version).

        Roman

         
    • Hi Roman,

      I know the 'ls' command thanks a lot. But how good is it. when preallocated files are used?!
      I will cook the stuff down to a small example and send your the code & core.

      Bye, Lars

       
      • Hi Lars,

        thank you, a small example will be very helpful. You are right regarding the ls command. However, if stxxl must extend the preallocated files, the ls output should match the real external memory consumption of stxxl (this is your case indeed).

        Otherwise you can use the raw_capacity function of stxxl::vector.

        Roman

         
        • Hi Lars,

          please check again for the exact error message you get using the recent stxxl.
          I'm not interested in the "External memory block allocation error",
          but the "terminate called after throwing an instance of" part directly before it terminates.

          I had some problems (I/O errors) with autogrowing files recently, too, but no time to debug it.
          As a workaround a set a max size (larger than the amount the program needed) in .stxxl
          and had it created as a sparse file when the program starts (stxxl does this automatically
          if the file does not exist or is smaller than requested).
          I was using only a single disk file and it was deleted before the program was started.

          And as Roman already said, if you use disks of different sizes, you will probably need a custom allocator that takes this into account.
          With the default allocator you might run into this problem:

          max_size = num_disks * min{all disk sizes}
          2 * min{190,190} = 380 GB
          3 * min{190,190,90} = 270 GB

          Andreas

           
    • Hi!
      We found the reason for the "mysterious" reduction of memory by adding small disks.
      However the overall memory consumption still puzzels me. It seems like the sorting is not done "in place".

      The enclosed sample code creates a vector with 1k element of size 1M (every element has size 1k). Then in every loop the size of the vector is increased by 50% and the vector is sorted. The programm is manually stopped, when the disc-file is extended for the first time. The initial size of the disc-file is 100MB. Without the sorting, the external disc-file is extended two iterations later.

      ls -lh /local/u01/data
      -rw-r----- 1 lkunert lengauer 100M Oct 29 12:15 /local/u01/data
      lkunert@infao3803:~ [542] error_test
      [STXXL-MSG] STXXL v1.2.1 (release)
      [STXXL-MSG] 1 disks are allocated, total space: 100 MB
      sizeof( Uint(0)): 1024

      number of elements       : 1364
      external memory size [MB]: 4
      external memory size [MB]: 4

      ...

      number of elements       : 42925
      external memory size [MB]: 44
      external memory size [MB]: 44

      number of elements       : 57232
      external memory size [MB]: 56
      [STXXL-ERRMSG] External memory block allocation error: 4194304 bytes requested, 0 bytes free. Trying to extend the external memory space...
      [STXXL-ERRMSG] External memory block allocation error: 4194304 bytes requested, 0 bytes free. Trying to extend the external memory space...
      [STXXL-ERRMSG] External memory block allocation error: 4194304 bytes requested, 0 bytes free. Trying to extend the external memory space...
      [STXXL-ERRMSG] External memory block allocation error: 4194304 bytes requested, 0 bytes free. Trying to extend the external memory space...
      external memory size [MB]: 56

      lkunert@infao3803:~ [543] ls -lh /local/u01/data
      -rw-r----- 1 lkunert lengauer 116M Oct 29 12:16 /local/u01/data

      #include "stxxl.h"

      typedef unsigned long long ST;

      template <typename T>
      struct Compare : public std::less<T>
      {
          static T
          min_value() { return T( 0 ); };

          static T
          max_value() { return T( std::numeric_limits<ST>::max()); };
      };

      struct Uint
      {
          ST i_;
          ST buffer_[127];

          Uint( uint i =0 )
            : i_( i )
          {};

          bool
          operator<( const Uint& o ) const
          {
              return i_ < o.i_;
          };
      };

      int main (int argc, char **argv)
      {
          static const ST NO_PAGES_IN_CACHE  =8;       
          static const ST NO_BLOCKS_IN_PAGE  =4;       
          static const ST BLOCK_SIZE         =4*1024*1024;
          static const ST M                  =2*BLOCK_SIZE;

          stxxl::VECTOR_GENERATOR< Uint, NO_BLOCKS_IN_PAGE, NO_PAGES_IN_CACHE, BLOCK_SIZE, stxxl::RC, stxxl::lru>::result v;

          std::cerr << "sizeof( Uint(0)): " << sizeof( Uint );

          for( ST i =1; i <= 1024; i++ )
          {
              v.push_back( Uint( i ));
          };
         
          for( ST j =0; j < 100; ++j )
          {
            ST n = v.size() /3;
              for( ST i =1; i < n; i++ )
          {
              v.push_back( Uint( i ));
          };
         
          std::cerr << "\n\n number of elements       : " << v.size() << "\n";
          std::cerr << " external memory size [MB]: " << sizeof( Uint(0)) * v.capacity() /1024 /1024 << "\n";
         
          stxxl::sort( v.begin(), v.end(), Compare<Uint>(), std::max( M, 2*BLOCK_SIZE ));
         
          std::cerr << " external memory size [MB]: " << sizeof( Uint(0)) * v.capacity() /1024 /1024 << "\n";
          };

          return( 0 );
      }

       
      • No, sorting was never in-place.

        For sorting a vector of N elements you need storage for approximately 2N elements.
        First sorted runs of the size of M are created from the vector data and stored intermediately.
        Next these sorted runs are merged into a sorted vector and stored in place of the original vector.
        Then the intermediate storage of the runs is freed again.

        Andreas