Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#1355 Out of memory crash when plotting wide datafiles

closed-fixed
nobody
None
5
2015-03-23
2014-03-14
Anonymous
No

I have a datafile with a couple dozen rows and 256x256 columns which I can use to plot and fit data as expected when I only access individual data columns. (e.g. plot 'blubb.txt' u 1:3).

If I try to manipulate the data columns (e.g. plot 'blubb.txt' u 1:($3-$1) ) gnuplot locks up a core and quickly crashes when it hits the 32bit memory limit.

Discussion

  • One part of the problem is that once you go beyond simple column specifications, gnuplot has no chance but to read the entire data file. And these days, gnuplot has additional features that may require it also store the entire file internally. That's a lot of memory. 2^16 columns and on the order of 100 rows isn't really that much, though, so it doesn't really explain the memory exhaustion. So: how big is your file, in total?

    To investigate this further, we would need further details. The version of gnuplot, the platform you observed this on, and most importantly, a complete test case.

     
  • Henrik A
    Henrik A
    2014-03-14

    The matrix I dumped was 44 rows x 65k columns, 22MB as double data, the ascii file was about 40MB.

    One row of data in this file is sufficient to reproduce the issue. I am running 32bit gnuplot 4.6.4 under win7 x64.

    Use the attached data file and >>plot 'GnuplotCrash.dat' u 1:($3-$1) w p
    to reproduce.

     
    Attachments
  • Ethan Merritt
    Ethan Merritt
    2014-03-14

    Thank you for pointing out this interesting case.

    The program has no problem handling long lines of data per se, but it makes the poor decision to also treat the first line as collection of strings to match against if the plot command later makes reference to a column by name ("column header") rather than by numerical index. The problem is compounded by another poor decision to allocate for each saved column header a string long enough to hold the entire first line.

    A temporary 1-line modification that skips saving the headers makes your test case execute easily.

    I am not sure what the best fix will be. It would be easy to add a command or keyword that informs the program not to save column header information, but that requires knowing in advance that special handling it necessary. A fix that "just works" would be better. Perhaps it would be sufficient to make the string allocation more clever. Maybe we can get away with saving only a single copy of the entire line and adding additional bookkeeping to define each column header as a substring.

     
    Last edit: Ethan Merritt 2014-03-14
  • Ethan Merritt
    Ethan Merritt
    2014-03-14

    Yes - not hard to use a single copy of the full line and pull out individual substrings.

    Improved behaviour in CVS for both 4.6 and 4.7

     
  • Ethan Merritt
    Ethan Merritt
    2014-03-14

    • status: open --> pending-fixed
     
  • Henrik A
    Henrik A
    2014-03-14

    Thanks!

     
  • Ethan Merritt
    Ethan Merritt
    2014-03-19

    • status: pending-fixed --> closed-fixed