Menu

Large matrices

Help
2007-02-15
2013-01-15
  • marshallbanana

    marshallbanana - 2007-02-15

    I am trying to create a matrix of dimensions 25 by 10^7 using command matrix(data=0,25,10^7) but cannot do so (I then wish to fill it with data generated by for loops). In fact the max that I am able to create is one with 10^8 entries (eg. matrix(data=0,10^8,1) or matrix(data=0,10^6,10^2). Anything bigger than this is met with, in 10^9 case;

    Error in matrix(data = 0, 1, 10^9) : cannot allocate vector of length 1000000000

    or, in 10^17 case;

    Error in matrix(data = 0, 1, 10^17) : matrix: invalid 'ncol' value (< 0)
    In addition: Warning message:
    NAs introduced by coercion

    Is this a limitation of R or of RKWard, and is there anyway around this?

     
    • P Kapat

      P Kapat - 2007-02-15

      That surely is an issue with R. Try doing the same by starting R from any terminal (or even on Windows). For some more information try ?Memory. Apart from this, there is obviously the restriction of physical memory. On my machine with 1 GB of physical ram, and 2 GB of swap, I could go to:

      > x = matrix(data=0,1,3*(10^8))
      > object.size(x)
      2.4e+09

      which is an usage of around 2.4 GB. Thus a 27 x 10^7 of 0s is achievable! Of course, going to 4*(10^8) (ie, nearly 3.125 GB) is physical impossible here. So, forget about 10^17, unless you have access to clusters / supercomupters. IMHO, for such large scale usage fall back to C, or try matlab/octave. Again, total physical mem will always be a restriction.

       
    • Thomas Friedrichsmeier

      PK already explained why this doesn't work. It is a memory issue.

      Just some additional thoughts:
      1) Keep in mind, that R may need to duplicate the entire object. This typically happens when writing to it (as far as I understand, it may also happen in a few other circumstances, but R tries to avoid unnecessary copies). Hence, even if the object is larger than half the available RAM, you are quite likely to get into trouble.

      2) RKWard does impose a memory overhead, but that should not be significant in this case (the overhead is mostly constant, until you start editing the object (which is not yet supported for matrices, anyway)).

      3) Since R will copy objects on write, initializing a very large object from a for loop is probably very inefficient. For each iteration (or at least for each assignment, even to a single cell in the matrix), the entire object would be copied (as far as I understand, maybe there are some low-level optimizations in R for this case). At least you should make sure to assign in large chunks to minimize the total number of assignment to the matrix. One alternative might be to first generate the data in a file, and then to read that file - or of course program it in C.

      4) I don't know the nature of your data, or what you intend to use it for. But maybe you could trade off speed for memory by generating all data on the fly. I.e., you'd create a function get.data.value (row, col) that calculates a single cell on the fly (or perhaps a single row), and returns it. It may even be possible to create a matrix-like object with matrix like subsetting operators that generates the data on demand. However, please don't ask me about details, here.

      Maybe some of this helps...

       
MongoDB Logo MongoDB