ncap2 memory failure

Help
NewUSer
2013-10-29
2013-10-29
  • NewUSer
    NewUSer
    2013-10-29

    I am executing the following command and getting a memory failure. My system has 12GB of free memory.

    ncap2 -O --no_tmp_fl -S Query1.nco TEMP1.nc Temp.nc

    The input file TEMP1.nc is of size 2.8GB. The memory requirements of ncap2 should be less than 6GB.
    Am I missing something? The Query1.nco scripts simply places a FillValue at undesired locations.

    Query1.nco:
    TEMP=TEMP;
    TEMP.set_miss(-999.0);
    where(TEMP>-2.0 && TEMP<1.5)
    TEMP=TEMP;
    elsewhere
    TEMP=TEMP@_FillValue;

    ncap2: ERROR nco_malloc() unable to allocate 5806080000 B = 5670000 kB = 5537 MB = 5 GB
    ncap2: INFO NCO has reported a malloc() failure. malloc() failures usually indicate that your machine does not have enough free memory (RAM+swap) to perform the requested operation. As such, malloc() failures result from the physical limitations imposed by your hardware. Read http://nco.sf.net/nco.html#mmr for a description of NCO memory usage. There are two workarounds in this scenario. One is to process your data in smaller chunks. The other is to use a machine with more free memory.

    Large tasks may uncover memory leaks in NCO. This is likeliest to occur with ncap. ncap scripts are completely dynamic and may be of arbitrary length and complexity. A script that contains many thousands of operations may uncover a slow memory leak even though each single operation consumes little additional memory. Memory leaks are usually identifiable by their memory usage signature. Leaks cause peak memory usage to increase monotonically with time regardless of script complexity. Slow leaks are very difficult to find. Sometimes a malloc() failure is the only noticeable clue to their existance. If you have good reasons to believe that your malloc() failure is ultimately due to an NCO memory leak (rather than inadequate RAM on your system), then we would be very interested in receiving a detailed bug report.

     
  • Charlie Zender
    Charlie Zender
    2013-10-29

    First, I believe your script would be more efficient as:

    TEMP.set_miss(-999.0);
    where(TEMP<-2.0 || TEMP>1.5) TEMP=TEMP@_FillValue;

    Second, the malloc() failure means that a single request for 5 GB failed.
    The total operation may require two of those gargantuan malloc()s and
    perhaps the second one failed. If you run ncap2 with these switches it will print all malloc() requests larger than 1 MB:

    export NCO_MMR_DBG=1;ncap2 -D 3 ...

    This works on all NCO operators, BTW.

    Assuming the malloc() is not a bug, the INFO message says what your
    options are.

    cz