Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

ncrcat: append to file along record dimension

Help
Heiko
2013-01-30
2013-10-17
  • Heiko
    Heiko
    2013-01-30

    Hi,
    our models run for quite a while and output data for certain time-steps, e.g. time1.nc time2.nc time3.nc. time is the record dimension, and each file has a record-dimension size of 1.
    At the end we want to produce a large file:

    ncrcat time1.nc time2.nc time3.nc all.nc
    

    That works fine, and all.nc contains 3 record dimensions.

    Since the model runs a long time, and the files are huge and need to be downloaded to another machine, we don't have all time*.nc files immediately. So, what I would like to do is, as soon as a new time*.nc file is available:

    ncrcat -A time1.nc all.nc
    ncrcat -A time2.nc all.nc
    ncrcat -A time3.nc all.nc
    

    But, unfortuneately, now my all.nc contains only one record dimension, and that is the one from time3.nc instead of 3 record dimensions. The 'append' does not add to the end of the record dimension. Is there an option to enforce that?

    Best regards,

    Heiko

     
  • Charlie Zender
    Charlie Zender
    2013-02-01

    > That works fine, and all.nc contains 3 record dimensions.

    First, i think what you mean is that the record dimension has three timesteps.

    Second, the manual describes what is meant by "Append".
    http://nco.sf.net/nco.html#append
    The nomenclature is confusing.

    Third, do you understand that NCO makes it safe to do this:
    ncrcat -O all.nc time1.nc all.nc
    ncrcat -O all.nc time2.nc all.nc
    et cetera…

    One problem with this is that it requires the machine to do a lot of file copying.
    This can be minimized by using the -no_tmp_fl switch.

    Finally, if you do all that, and ncrcat is still too slow for your large files, then please post back.
    The feature you want is possible to implement. I'm just not sure you clearly need it.
    cz

     
  • Charlie Zender
    Charlie Zender
    2013-02-01

    Heiko,
    I decided to implement this feature since it was straightforward.
    It is now in the latest snapshot. Please test it if possible and post your feedback.
    It will be in 4.2.6.
    Brief documentation from forthcoming ANNOUNCE:

    A. ncra and ncrcat accept a new switch -record_append that
       significantly speeds concatentating files onto existing files.
       Previously to preserve the existing contents of fl1.nc, and add to
       them the new records contained in fl2.nc, and place the output in a
       third file, fl3.nc (which could safely be named fl2.nc), one did
       ncrcat -O fl1.nc fl2.nc fl3.nc
       Under the hood this operation copies all of the information
       in fl1.nc and fl2.nc. Twice. This is expensive for large files.
       The new -record_append switch causes all records in fl1.nc to be
       appended to the end of the corresponding records in fl2.nc:
       ncrcat -rec_apn fl1.nc fl2.nc
       The contents of fl2.nc are completely preserved, and only
       values in fl1.nc are copied. -rec_apn automatically puts NCO into
       append mode, so specifying -A is redundant, and simultaneously
       specifying overwrite mode with -O causes an error.
       By default, NCO works in an intermediate temporary file.
       Power users may combine -rec_apn with -no_tmp_fl:
       ncrcat -rec_apn -no_tmp_fl fl1.nc fl2.nc
       This avoids creating an intermediate file, and copies only the
       minimal amount of data (i.e., all of fl1.nc). Hence, it is fast.
       We recommend users first read the safety trade-offs involved:
       http://nco.sf.net/nco.html#no_tmp_fl
       http://nco.sf.net/nco.html#rec_apn

     
  • Heiko
    Heiko
    2013-02-28

    Charly,

    sorry about the late answer. I forgot to monitor the issue and didn't check the forum regularly.

    Thanks for your work with rec_apn, it works exactly as expected. Creating a 3.3G netcdf3 file from 11 single timesteps takes now ~35s, including -no_tmp_fl, which is exactly the same time as ncrcat'ing the 11 timesteps at once. Writinge compressed netcdf4 files took considerably longer, 2:21minutes and threading doesn't seem to help here.

    Best regards,

    Heiko

     
  • Charlie Zender
    Charlie Zender
    2013-05-23

    Posted for Richard Mladek:

    I've tried to add my reply to this topic
    http://sourceforge.net/projects/nco/forums/forum/9830/topic/6699093/index/page/1
    but have problems with it - I've been always logged off by the system..

    That's why I'm trying to contact you directly sorry for that and hopefully it's not a problem for you..

    For me the command below is not working as expected.
    ncrcat -rec_apn -no_tmp_fl  <1rec>.nc <multirec>.nc

    On average the time needed for addition of one new record is proportional to the records in the file:
    append 1th record ~ 20sec
    append 10th record ~ 206sec
    append 28th record ~ 560sec
    I've tried appending of 1-record file of size 2.5GB one by one. The resulting file with 28 records has thus almost 80GB.

    Please advice where the problem can be.
    The nco version is 4.3.1 and it's installed on IBM Cluster 1600 super-computer.

    Thank you in advance,
    Best regards,
    Richard

    Richard Mladek
    consultant on GEOWOW project
    ECMWF, Shinfield Park, Reading RG2 9AX, UK

     
  • Charlie Zender
    Charlie Zender
    2013-05-24

    > For me the command below is not working as expected.
    > ncrcat -rec_apn -no_tmp_fl  <1rec>.nc <multirec>.nc

    It seems like it is working but slower than you expected. Yes?

    > On average the time needed for addition of one new record is proportional to the records in the file:
    > append 1th record ~ 20sec
    > append 10th record ~ 206sec
    > append 28th record ~ 560sec
    > I've tried appending of 1-record file of size 2.5GB one by one. The
    > resulting file with 28 records has thus almost 80GB.

    This is disappointing. It seems to be doing an internal copy of the
    existing data rather than appending to the end of the file.
    -rec_apn was designed to eliminate such wasteful behavior.
    First try running the commands with -h to suppress some metadata
    writing that may cause the internal copy.

    Then, how is the speed when you eliminate the -rec_apn switch?
    i.e., please report the same timings with

    ncrcat -O -no_tmp_fl <multirec>.nc <1rec>.nc out.nc

    and

    ncrcat -O <multirec>.nc <1rec>.nc <multirec>.nc

    > Please advice where the problem can be.
    > The nco version is 4.3.1 and it's installed on IBM Cluster 1600
    > super-computer.

    Is this Linux or AIX?
    What version of netCDF?
    What is the filesystem block size?

    Charlie

     
  • rmla
    rmla
    2013-06-19

    Thank you very much for your help. The option -h works great! Now addition of every new record takes only 5 seconds no matter how big the growing file is! Excellent.

    I've got nevertheless another problem. I'm testing how complicated is to handle such big files (~80GB with 2.62GB per record). The slicing using ncea works very well and quickly for all variables except of time record. I've succeeded to extract only 2 records as any other bigger time record slices take more than 50GB of all our available memory. The example:

    ncea -O -dtime,0,1 multiRec.nc 0-1rec.nc
    => OK but max. used memory 42993MB

    ncea -O -dtime,0,2 multiRec.nc 0-2rec.nc
    => failing with error message:
    ncea: ERROR nco_malloc() unable to allocate 16896000000 bytes
    ncea: INFO NCO has reported a malloc() failure. malloc() failures usually indicate that your machine does not have enough free memory (RAM+swap) to perform the requested op
    eration. As such, malloc() failures result from the physical limitations imposed by your hardware. Read http://nco.sf.net/nco.html#mmr for a description of NCO memory usage
    . There are two workarounds in this scenario. One is to process your data in smaller chunks. The other is to use a machine with more free memory.

    To my experience the error message is always saying that the job requires 2-4 times more memory that the actual size of the extracted part from the file is but in reality it needs even much more memory to do the job. It means that even if according to the error message around 16GB of memory should by enough it's not  true as there is 50GB available and it's still not sufficient.. Please advice where the problem can be (-h option doesn't work here :-). In fact I would suppose to have problems with slicing of non-record parameters but it's completely opposite.

    The operating system on the super-computer is AIX, NetCDF verion is 4.2.

     
  • Charlie Zender
    Charlie Zender
    2013-06-19

    Please send the NCO version, the metadata head from multiRec.nc, and the output of running you ncea command with -D 5
    cz

     
  • rmla
    rmla
    2013-06-20

    Thank you very much for your quick reply. The nco version is 4.0.8/LP64.

    metadata:
    dimensions:
            time = UNLIMITED ; // (32 currently)
            id_models = 1 ;
            id_points = 10000 ;
            id_params = 20 ;
            id_runs = 1 ;
            id_steps = 44 ;
            id_grdps = 4 ;
            id_numbers = 20 ;
    variables:
            double time(time) ;
            int fcst_data(time, id_models, id_points, id_params, id_runs, id_steps, id_grdps, id_numbers) ;

    output from command ncea -D 5 -O -dtime,0,2 p11.all_00run.multiRec.nc p11.all_00run.0-2rec.nc:
    Thu Jun 20 07:43:04 UTC 2013
    start Thu Jun 20 07:43:05 UTC 2013
    ncea: INFO nco_aed_prc() examining variable Global
    ncea: DEBUG nco_var_dfn() about to define variable time with 1 dimension (ordinal,output ID): time (0,unknown)
    ncea: DEBUG nco_var_dfn() defined variable time with 1 dimension (ordinal,output ID): time (0,0)
    ncea: DEBUG nco_var_dfn() about to define variable fcst_data with 8 dimensions (ordinal,output ID): time (0,unknown), id_models (1,unknown), id_points (2,unknown), id_param
    s (3,unknown), id_runs (4,unknown), id_steps (5,unknown), id_grdps (6,unknown), id_numbers (7,unknown)
    ncea: DEBUG nco_var_dfn() defined variable fcst_data with 8 dimensions (ordinal,output ID): time (0,0), id_models (1,1), id_points (2,2), id_params (3,3), id_runs (4,4), id
    _steps (5,5), id_grdps (6,6), id_numbers (7,7)
    ncea: ERROR nco_malloc() unable to allocate 16896000000 bytes
    ncea: INFO NCO has reported a malloc() failure. malloc() failures usually indicate that your machine does not have enough free memory (RAM+swap) to perform the requested op
    eration. As such, malloc() failures result from the physical limitations imposed by your hardware. Read http://nco.sf.net/nco.html#mmr for a description of NCO memory usage
    . There are two workarounds in this scenario. One is to process your data in smaller chunks. The other is to use a machine with more free memory.

    Large tasks may uncover memory leaks in NCO. This is likeliest to occur with ncap. ncap scripts are completely dynamic and may be of arbitrary length and complexity. A scri
    pt that contains many thousands of operations may uncover a slow memory leak even though each single operation consumes little additional memory. Memory leaks are usually i
    dentifiable by their memory usage signature. Leaks cause peak memory usage to increase monotonically with time regardless of script complexity. Slow leaks are very difficul
    t to find. Sometimes a malloc() failure is the only noticeable clue to their existance. If you have good reasons to believe that your malloc() failure is ultimately due to
    an NCO memory leak (rather than inadequate RAM on your system), then we would be very interested in receiving a detailed bug report.ncea: ERROR exiting through nco_exit() w
    hich will now call exit(EXIT_FAILURE)
    stop Thu Jun 20 07:43:34 UTC 2013

     
  • Charlie Zender
    Charlie Zender
    2013-06-21

    it does appear that ncea is using too much memory here.
    a lot has changed in the netcdf library and in nco since april 2011 when 4.0.8 was released.
    I hope you will try running this on the latest NCO version, 4.3.1. if the problem still exists there, then we will look into what is causing it.
    best,
    cz

     
  • rmla
    rmla
    2013-06-21

    I've tried version 4.3.1 too both on our super-computer and linux cluster  with the same unfortunately negative result. I've tried also -no_tmp_fl option or to play with -thr_nbr but without any improvement.  Please find below some additional output from ncea.

    # job2.sh> module load nco/4.3.1
    load nco 4.3.1 (EC_FFLAGS, EC_CFLAGS, EC_CXXFLAGS, EC_FLDFLAGS, EC_CLDFLAGS, EC_CXXLDFLAGS)
    # job2.sh> date
    # job2.sh> echo start Fri Jun 21 08:14:04 UTC 2013
    # job2.sh> ncea -D 5 -O -dtime,0,2 p11.all_00run.multiRec.nc p11.all_00run.0-2rec.nc
    ncea: INFO User did not specify thread request > 0 on command line. NCO will automatically assign threads based on OMP_NUM_THREADS environment and machine capabilities.
    HINT: Not specifiying any -thr_nbr (or specifying -thr_nbr=0) causes NCO to try to pick the optimal thread number. Specifying -thr_nbr=1 tells NCO to execute in Uni-Proc
    essor (UP) (i.e., single-threaded) mode.
    ncea: INFO Environment variable OMP_NUM_THREADS = 64
    ncea: INFO Number of processors available is 64
    ncea: INFO Maximum number of threads system allows is 64
    ncea: INFO Allowing OS to dynamically set threads
    ncea: INFO System will utilize dynamic threading
    ncea: INFO Reducing default thread number from 64 to 4, an operator-dependent "play-nice" number set in nco_openmp_ini()
    ncea: INFO omp_set_num_threads() used to set execution environment to spawn teams of 1 threads
    ncea: INFO After using omp_set_num_threads() to adjust for any user requests/NCO optimizations, omp_get_max_threads() reports that a parallel construct here/now would spawn
    1 threads
    ncea: INFO Small parallel test region spawned team of 1 threads
    ncea: INFO nc__open() will request file buffer of default size
    ncea: INFO nc__open() opened file with buffer size = 4194304 bytes
    ncea: INFO Input file 0 is p11.all_00run.multiRec.nc
    ncea: INFO nc__open() will request file buffer of default size
    ncea: INFO nc__open() opened file with buffer size = 4194304 bytes
    ncea: INFO main loop thread #0 processing var_prc = "fcst_data"
    ncea: DEBUG Promoting variable fcst_data from type NC_INT to type NC_DOUBLE
    ncea: DEBUG Promoting variable fcst_data from type NC_INT to type NC_DOUBLE

     
  • Charlie Zender
    Charlie Zender
    2013-06-21

    please put one record (timestep) of the input file on a publicly accessible website so i can download it.
    it should be ~700 MB.
    thx,
    cz

     
  • Charlie Zender
    Charlie Zender
    2013-06-26

    This is an update on the problem you reported.

    First, read this
    http://nco.sf.net/nco.html#mmr
    and you will see that the memory requirements for ncea is
    2*filesize + 1*max_var_sz = 3*filesize
    for your use case. filesize refers to filesize _after_ hyperslabbing,
    which for this test file is ~1.5 GB.
    So expected maximum sustained memory use is ~4.5 GB at one time.
    This is abbreviate RSS (resident set size).

    On my system the ncea command peaks at 11.1 GB Virtual, 5.3 GB RSS,
    and never completes, much as on your system. I do not yet know why
    the virtual memory requirement is nearly three times expected.
    Just verifying that it take more memory than it should.

    Second, ncrcat on the same file works fine and as expected.
    I named the file you uploaded big_bug.nc, then concatenated it five times:
    ncrcat -O -D 4 big_bug.nc big_bug.nc big_bug.nc big_bug.nc big_bug.nc big_bug5.nc
    ncrcat command RSS peaks at ~541 MB as expected, Virtual ~650 MB as expected.

    Third, the ncea command you are executing is legal but makes no sense.
    It should copy the first three records of the file to the output file.
    If you want the average of the first three records, use ncra.
    If you simply want to copy them, use ncks.
    ncea is an ensemble averager and uses much more memory than ncra or ncks.
    Please use the right tool for the job.

    Fourth, my guesses as to why ncea uses so much memory are
    A. ncea gets greedy and has a memory leak
    B. ncea requests 3*filesize as expected, but netCDF library asks for more.
    C. for some reason NCO is using more threads (4?) than it should (1).
    D. ???

    Will post more when I know more.
    cz

     
  • Charlie Zender
    Charlie Zender
    2013-06-26

    More progress on the problem you reported.

    Your data are NC_INT (not NC_FLOAT), so NCO automatically promotes
    them to NC_DOUBLE prior to arithmetic. This essentially doubles RAM
    for your use case. i.e., 9.0 instead of 4.5 GB required.
    See updated discussion at http://nco.sf.net/nco.html#mmr

    I think that's the basic problem. On my 8 GB machine, that throws me
    into swap space and thrashes the disk and it "never" finishes.
    Not sure this is worth pursuing farther.
    Did my previous suggestion to use ncks or ncra get your task working?

    About the only simple way to reduce the memory requirement is to add
    a user switch to disable/enable promotion for arithmetic.
    It could change requirements by at most a factor of two in most cases.

    cz

     
  • rmla
    rmla
    2013-06-26

    I really appreciate your support. I'm a nco greenhorn. The ncea command has been chosen just as the first candidate based on my searching  web and finally nco web pages. I will read http://nco.sourceforge.net/nco.html more carefully as it's definitively source of valuable information. Thank you for your excellent job and very useful nco tools.
    RM