How to store a very large sparse matrix which keep undergoing multiplications and elementary transformations in several steps???

2014-04-22
2014-04-24
  • This is a design issue i am facing with my project currently.

    I have a very large matrix (say 10^7 * 10^6 approx 8TB) But most of the values of matrix are zeros(approx 99% values are 0's). hence sparse.

    Complications:
    1. How to store such a matrix?
    can i use the existing matrix class of stxxl directly? will it exploit sparsity?

    1. How will the storage effect computation, I/O Performance?
      I need to perform matrix multiplication and some other elementary operations on matrix. This needs a column-wise or row-wise storage of matrix.

    I was thinking of storing matrix as a set of columns with each column being stxxl::vector. Also the columns are packed as <index, value="">.
    i.e vector [1 0 0 0 0 0 0 0 0 0 0 0 4] is represented as [<1,1>, <13,4>].

    Although the design is simple and serving the purpose for now while initialization, the problem is that each column/vector keeps changing in size after every transformation on matrix. This gradually results in most of the column vectors with maximum capacity fill.

    Although the column vectors are globally sparse, they tend to fill at some transformations and free at rest, the vectors keep up the capacity and increase space requirement.

    How to solve this problem. Is there any better elegant way to solve this problem with external memory or could you suggest any work around?

    Thank you.

     
    • Timo Bingmann
      Timo Bingmann
      2014-04-24

      The matrix library in STXXL was developed as a student thesis project, which you can find here:

      http://algo2.iti.kit.edu/english/1919.php

      I believe that the matrix is stored block-wise, and that multiplication algorithms do recursive block multiplications. The algorithms are pretty sophisticated, but i guess, mostly targeted at dense matrices.

      There is a lot of information about external memory matrix operations in the thesis, much more than I know about myself.