Menu

#4056 Cannot create very large data array

None
closed
nobody
genmatrix (1)
5
2023-03-30
2022-12-15
Rene Hansen
No

If I try this one in a worksheet (wxMaxima) containing other data:

tMM : genmatrix(lambda([x, y], 0),  933252, 16)$

Maxima crashes with:

Message from the stdout of Maxima: Welcome to LDB, a low-level debugger for the Lisp runtime environment.
(GC in progress, oldspace=3, newspace=4)
ldb>
Message from maxima's stderr stream: Heap exhausted during garbage collection: 16 bytes available, 32 requested.
        Immobile Object Counts
 Gen layout fdefn symbol   code  Boxed   Cons    Raw   Code  SmMix  Mixed  LgRaw LgCode  LgMix Waste%       Alloc        Trig   Dirty GCs Mem-age
  3      0     51      1   1444     43  18053  20856      1     28     22      0      0      0    0.4  1272773232   712014656   39003   1  1.2766
  4      9   1268    227     82     14  14131  10014      0     21      1      0      0     22    0.5   789121888     2000000   24203   0  0.0000
  5      0      0      0      0      0      0      0      0      0      0      0      0      0    0.0           0     2000000       0   0  0.0000
fatal error encountered in SBCL pid 268019880:

If I try it in an empty worksheet (wxMaxima) it goes ok. However it takes more than an hour !!!

Discussion

  • Stavros Macrakis

    It's unfortunate that Maxima fails on a matrix of size 933252 x 16, but you should keep in mind that Maxima matrices are implemented using lists, and therefore very inefficient as they get large. That said, Maxima arrays (which are much more efficiently implemented) do not support matrix operations. Since you called this a "data array" rather than a "matrix", perhaps you don't need matrix operations at all?

     
  • Stavros Macrakis

    The maximum size array (not matrix) that I could create in SBCL was 1734000 x 16, so not that much bigger than the matrix. That said, it only took 150mS to create that array, which is about 220MB in size.

     
    • Gunter Königsmann

      SBCL by default allocates 500 Megabytes of memory at start-up and cannot extend that at run-time. But you can tell it to allocate more memory at start-up time. wxMaxima's config dialogue (in the maxima tab) can tell you how to do that.

      If every list element contains a pointer to the previous one (should be 8 bytes) and a pointer to the next one (other 8 bytes) plus a pointer to the data (other 8 bytes) and contains 8 bytes of data and perhaps 8 bytes of data (that would be only one 64-bit word) and other 8 bytes of flags with information on how to interpret the data (like the information if the data is a floating-point number, an integer, a string, an arbitrary-sized integer or a lisp object of some kind) that's 40 bytes per matrix element. 16 million matrix elements would extend to 640 Megabytes, this way.

      If things get slow that often is a sign of some temporary data being allocated during the creation of every single element: Every equivalent of a malloc() or a free() needs some time. If 16 Million elements require more than a few seconds it might be worth investigating what hinders the thing from being efficient.

       
      • Stavros Macrakis

        genmatrix creates matrix rows and columns using nconc on increasingly
        long lists. This is an antipattern in Lisp programming, because it means
        that its performance is O(r^2+c^2).
        A straightforward rewrite would build the rows and columns with cons and
        then do an nreverse at the end.

         

        Last edit: Robert Dodier 2023-02-09
  • Rene Hansen

    Rene Hansen - 2022-12-16

    My task is that I have a data file containing 933252 lines of data like this:

    2022.03.29  20:57:13    nan nan nan nan 210 nan 1   nan 233 2   6.6703  1.00677 19.187  100 0   3
    

    Where each cell is a measurement result of a certain quantity e.g. pressure, temperature etc. I would like to plot the 933252 measurements of each quantity e,g, the temperature.

    I have made my own function to read in the raw data from the csv-file which works quite well. The individual data is easily separated:

    ["2022.03.26","19:35:57","nan","nan","nan","nan","120","nan","0","nan","153","0","6.78323","1.02828","27.312","100","0","3",""]
    

    And converted to floats (Deleting the first two cells containing the date and time):

    map(StrToFloat, %);
    [0,0,0,0,120,0,0,0,153,0,6.78323,1.02828,27.312,100,0,3]
    

    My first thought was to fill a data-array with all the data and then pick a single column to get the corresponding quantity.

    Maybe it would be faster to fill in a number of lists - one for each quantity.

     

    Last edit: Robert Dodier 2023-02-09
    • Gunter Königsmann

      Lists are about the only option if the amount of data isn't known in advance. Arrays are more efficient => reading the data into lists and then converting the lists to arrays might be an optimal option.

       
  • Robert Dodier

    Robert Dodier - 2023-02-09

    Rene, thanks for your interest in Maxima. Unfortunately Maxima is not the best tool for handling large arrays of data. In the interest of helping you get to a working solution for your problem, I would like to recommend the statistical software R (https://r-project.org), which has much more extensive functions for handling data. See read.table.

    In Maxima, the most efficient way to read data is to read into an existing array. See read_array. However, with an array in hand, you still have to extract a column to be plotted, and that will be somewhat clumsy. Also, read_array doesn't handle data and time columns very well. I believe that what you want to do could be handled by Maxima, but it would be some work. If you would like to pursue that further, my recommendation is to raise the topic on the maxima-discuss mailing list.

     
    • Rene Hansen

      Rene Hansen - 2023-02-28

      Hi Robert
      Sorry for my delay. I am doing some mathematical work with the data
      afterwards :-)
      I am trying to increase the memory available to Maxima, it does not seem to
      work. I am probably doing something wrong:
      [image: image.png]
      /Rene'

       

      Last edit: Robert Dodier 2023-03-11
  • Rene Hansen

    Rene Hansen - 2023-02-28

    Hi Robert
    Sorry for my delay. I am doing some mathematical work with the data afterwards :-)
    I am trying to increase the memory available to Maxima, it does not seem to work. I am probably doing something wrong:
    -X " --dynamic-space-size 2000"

     

    Last edit: Rene Hansen 2023-02-28
    • Stavros Macrakis

      As Robert said earlier, the Maxima matrix data type is not suitable for large data arrays in many ways. And as I said, genmatrix is extremely inefficient for large arrays of data.
      Expanding memory will not solve these problems.

      I agree with Robert that you should use a more suitable system, such as R or Python/Pandas. If you must use Maxima, use the array type, which however does not support any mathematical operations.

                 -s
      
       
  • Tomio Arisaka

    Tomio Arisaka - 2023-03-02

    genmatrix worked as follows:
    (but it took over 17 minutes)

    $ rmaxima -X "--dynamic-space-size 2000"
    Lisp options: (--dynamic-space-size 2000)
    Maxima 5.46.0 https://maxima.sourceforge.io
    using Lisp SBCL 2.3.1
    Distributed under the GNU Public License. See the file COPYING.
    Dedicated to the memory of William Schelter.
    The function bug_report() provides bug reporting information.
    (%i1) :lisp (/ (sb-ext:dynamic-space-size) (expt 1024 2))
    
    2000
    (%i1) tMM : genmatrix(lambda([x, y], 0),  933252, 16) $
    
    (%i2) time(%);
    (%o2)                            [1033.780789]
    (%i3) %/60;
    (%o3)                         [17.22967981666666]
    (%i4) :lisp (length $|tMM|)
    
    933253
    (%i4) 
    

    If you program with lisp, you will save time.
    For example, the next lisp function gen_matrix is faster than genmatrix:

    $ rmaxima -X "--dynamic-space-size 2000"
    Lisp options: (--dynamic-space-size 2000)
    Maxima 5.46.0 https://maxima.sourceforge.io
    using Lisp SBCL 2.3.1
    Distributed under the GNU Public License. See the file COPYING.
    Dedicated to the memory of William Schelter.
    The function bug_report() provides bug reporting information.
    (%i1) :lisp (/ (sb-ext:dynamic-space-size) (expt 1024 2))
    
    2000
    (%i1) :lisp (defun gen_matrix (fun y x) (cons '($matrix simp) (loop for i from 1 to y collect (cons '(mlist . (simp)) (loop for j from 1 to x collect (mfuncall fun i j))))))
    
    GEN_MATRIX
    (%i1) :lisp (describe 'gen_matrix)
    
    MAXIMA::GEN_MATRIX
      [symbol]
    
    GEN_MATRIX names a compiled function:
      Lambda-list: (FUN Y X)
      Derived type: (FUNCTION (T T T) (VALUES CONS &OPTIONAL))
      Source form:
        (LAMBDA (FUN Y X)
          (BLOCK GEN_MATRIX
            (CONS '($MATRIX SIMP)
                  (LOOP FOR I FROM 1 TO Y
                        COLLECT (CONS '(MLIST SIMP)
                                      (LOOP FOR J FROM 1 TO X
                                            COLLECT (MFUNCALL FUN I J)))))))
    (%i1) tMM : ?gen_matrix(lambda([y, x], 0), 933252, 16) $
    
    (%i2) time(%);
    (%o2)                             [10.46828]
    (%i3) %/60;
    (%o3)                        [0.1744713333333333]
    (%i4) :lisp (length $|tMM|)
    
    933253
    (%i4) :lisp (length (cadr $|tMM|))
    
    17
    (%i4) 
    

    But the gen_matrix is a subset of genmatrix. So the first argument is a function only, and gen_matrix does not check the arguments.

    (%i4) array (a, fixnum, 2, 2);
    (%o4)                                  a
    (%i5) a [1, 1] : %e;
    (%o5)                                 %e
    (%i6) a [2, 2] : %pi;
    (%o6)                                 %pi
    (%i7) ?gen_matrix(lambda([y, x], a[y, x]), 2, 2);
                                      [ %e   0  ]
    (%o7)                             [         ]
                                      [ 0   %pi ]
    (%i8) ?gen_matrix(lambda([i, j], j - i), 3, 3);
                                    [  0    1   2 ]
                                    [             ]
    (%o8)                           [ - 1   0   1 ]
                                    [             ]
                                    [ - 2  - 1  0 ]
    (%i9) ?gen_matrix(lambda([i, j], B[i, j]), 2, 2);
                                   [ B      B     ]
                                   [  1, 1   1, 2 ]
    (%o9)                          [              ]
                                   [ B      B     ]
                                   [  2, 1   2, 2 ]
    (%i10) 
    
     
    • Stavros Macrakis

      Thanks for improving on genmatrix, which is ridiculously slow. We should use your approach in the standard genmatrix.

      That said, Maxima matrices remain a poor way of manipulating large data sets. Accessing A[i,j] takes O(i+j) time, where in standard array implementations, it takes O(1) time.

       
  • Tomio Arisaka

    Tomio Arisaka - 2023-03-03

    Stavros, thanks for your advice.

    I made a Maxima function gen_matrix which is a modified version of genmatrix.
    So gen_matrix is compatible with genmatrix.

    $ rmaxima -X "--dynamic-space-size 2000"
    Lisp options: (--dynamic-space-size 2000)
    Maxima 5.46.0 https://maxima.sourceforge.io
    using Lisp SBCL 2.3.1
    Distributed under the GNU Public License. See the file COPYING.
    Dedicated to the memory of William Schelter.
    The function bug_report() provides bug reporting information.
    (%i1) :lisp (/ (sb-ext:dynamic-space-size) (expt 1024 2))
    
    2000
    (%i1) to_lisp();
    
    Type (to-maxima) to restart, ($quit) to quit Maxima.
    
    MAXIMA> ;; new version of GENMATRIX
    (defmfun $gen_matrix (a i2 &optional (j2 i2) (i1 1) (j1 i1))
      (let ((f))
        (setq f (if (or (symbolp a) (hash-table-p a) (arrayp a))
            #'(lambda (i j) (meval (list (list a 'array) i j)))
              #'(lambda (i j) (mfuncall a i j))))
    
        (if (notevery #'fixnump (list i2 j2 i1 j1))
            (merror (intl:gettext "genmatrix: bounds must be integers; found ~M, ~M, ~M, ~M") i2 j2 i1 j1))
    
        (if (or (> i1 i2) (> j1 j2))
            (merror (intl:gettext "genmatrix: upper bounds must be greater than or equal to lower bounds; found ~M, ~M, ~M, ~M") i2 j2 i1 j1))
    
        (cons '($matrix simp)
              (loop for i from i1 to i2 collect (cons '(mlist . (simp))
                                                      (loop for j from j1 to j2 collect (funcall f i j)))))))
    
    $GEN_MATRIX
    MAXIMA> (to-maxima)
    Returning to Maxima
    (%o1)                                true
    (%i2) tMM : gen_matrix(lambda([y, x], 0), 933252, 16) $
    
    (%i3) time(%);
    (%o3)                             [10.37354]
    (%i4) %/60;
    (%o4)                        [0.1728923333333333]
    (%i5) h [i, j] := 1 / (i + j - 1);
                                               1
    (%o5)                         h     := ---------
                                   i, j    i + j - 1
    (%i6) gen_matrix (h, 3, 3);
                                      [    1  1 ]
                                      [ 1  -  - ]
                                      [    2  3 ]
                                      [         ]
                                      [ 1  1  1 ]
    (%o6)                             [ -  -  - ]
                                      [ 2  3  4 ]
                                      [         ]
                                      [ 1  1  1 ]
                                      [ -  -  - ]
                                      [ 3  4  5 ]
    (%i7) array (a, fixnum, 2, 2);
    (%o7)                                  a
    (%i8) a [1, 1] : %e;
    (%o8)                                 %e
    (%i9) a [2, 2] : %pi;
    (%o9)                                 %pi
    (%i10) gen_matrix (a, 2, 2);
                                      [ %e   0  ]
    (%o10)                            [         ]
                                      [ 0   %pi ]
    (%i11) gen_matrix (lambda ([i, j], j - i), 3, 3);
                                    [  0    1   2 ]
                                    [             ]
    (%o11)                          [ - 1   0   1 ]
                                    [             ]
                                    [ - 2  - 1  0 ]
    (%i12) gen_matrix (B, 2, 2);
                                   [ B      B     ]
                                   [  1, 1   1, 2 ]
    (%o12)                         [              ]
                                   [ B      B     ]
                                   [  2, 1   2, 2 ]
    (%i13) gen_matrix (C, 2);
                                   [ C      C     ]
                                   [  1, 1   1, 2 ]
    (%o13)                         [              ]
                                   [ C      C     ]
                                   [  2, 1   2, 2 ]
    (%i14) quit();
    $ 
    
     
  • Stavros Macrakis

    • summary: Cannot create wery large data array --> Cannot create very large data array
     
  • Tomio Arisaka

    Tomio Arisaka - 2023-03-05

    This is an updated version of gen_matrix:

    ;; new version of GENMATRIX
    (defmfun $gen_matrix (a i2 &optional (j2 i2) (i1 1) (j1 i1))
      (let* ((f)
             (s (if $simp '(simp) nil))
             (mat (cons '$matrix s))
             (ml (cons 'mlist s)))
        (setq f (if (or (symbolp a) (hash-table-p a) (arrayp a))
            #'(lambda (i j) (meval (list (list a 'array) i j)))
              #'(lambda (i j) (mfuncall a i j))))
    
        (if (notevery #'fixnump (list i2 j2 i1 j1))
            (merror (intl:gettext "genmatrix: bounds must be integers; found ~M, ~M, ~M, ~M") i2 j2 i1 j1))
    
        (if (or (> i1 i2) (> j1 j2))
            (merror (intl:gettext "genmatrix: upper bounds must be greater than or equal to lower bounds; found ~M, ~M, ~M, ~M") i2 j2 i1 j1))
    
        (cons mat
              (loop for i from i1 to i2 collect (cons ml
                                                      (loop for j from j1 to j2 collect (funcall f i j)))))))
    

    Thanks.

     
    • Robert Dodier

      Robert Dodier - 2023-03-11

      Hi @tomio-arisaka, thanks a lot for working on genmatrix. Can you please commit your new version? And also some test cases to verify that it works as expected. Looks like it is a good improvement.

       
      • Tomio Arisaka

        Tomio Arisaka - 2023-03-14

        Thank you for your advice.

        I made two patch files attached to this comment.
        So if I have something wrong, please let me know.
        If the patches are OK, I will commit them.

         
  • Robert Dodier

    Robert Dodier - 2023-03-30
    • labels: --> genmatrix
    • status: open --> closed
     
  • Robert Dodier

    Robert Dodier - 2023-03-30

    Thanks, Tomio, I see you have committed your work as commit 36f7624. That's a great improvement. Closing this bug report. There are still additional issues about arrays and matrices, let's open new tickets for any such issues.

     

Log in to post a comment.