NCO netCDF Operators / Discussion / Help: Using ncrcat to subset large files

Nobody/Anonymous - 2005-04-29

Hello -

We are trying to perform subsetting procedures on very large files using ncrcat. These files are aggregated together using the Unidata aggregation server and so are many gigabytes in size. What we are trying to accomplish is to allow users to pull subsets out of the aggregated datasets. For small requests, it works fine. But, as stated in the NCO manual, large requests exhaust the machines memory and the process crashes.

Ideally, we'd be able to write, say, 100 timesteps at a time to a file and just append to this same file until the full hyperslab is written out. This would prevent the memory problems from occuring.

Is there a recommended way to do this?

For example, I know this will work:
    ncrcat -d time 1,100 -v theta my_data.nc out1.nc
    ncrcat -d time 101,200 -v theta my_data.nc out2.nc
    ncrcat -d time 201,300 -v theta my_data.nc out3.nc
    ncrcat out[123].nc out_all.nc

But it'd be nicer to do this:
    ncrcat -d time 1,100 -v theta my_data.nc out_all.nc
    ncrcat -A -d time 101,200 -v theta my_data.nc out_all.nc
    ncrcat -A -d time 201,300 -v theta my_data.nc out_all.nc

Or is there a better way to achieve the goal of subsetting?

Thanks

Kevin

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Charlie Zender - 2005-05-03
  
  Hi,
  
  I am glad you are trying to use NCO to hyperslabe very large files
  because this is exactly what we are trying to optimize it for and it's
  always helpful to hear real life torture tests of your software.
  Your report perplexes me because NCO should definitely NOT
  run out of memory on an ncrcat job. It shoud use the same
  sustained memory to ncrcat two timesteps as two million.
  The following should work fine:
  
  ncrcat -d time 1,300 -v theta my_data.nc out_all.nc
  
  i.e., ask for the full hyperslab at one time.
  
  If this does lead to a problem on NCO 3.0.0 then let us know
  and we will try to fix it. Follow the error reporting procedures
  at http://nco.sf.net/nco.html#bugs
  
  > But, as stated in the NCO manual, large requests exhaust the > machines memory and the process crashes.
  
  Send the requisite error data for us to understand this.
  Better yet, post the my_data.nc file where we can get it to try
  to reproduce the problem.
  
  Thanks,
  Charlie
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nobody/Anonymous - 2005-05-03
  
  Hi Charlie -
  
  What I am seeing is that as the ncrcat process subsets, it's memory size continues to grow as if memory isn't being freed after writing out a time step. In fact, the memory of the process grows roughly equivalent to the size of the .tmp file.
  
  Here's an example that you should be able to try with a dods-enabled ncrcat:
  
  > ncrcat -r
  NCO netCDF Operators version "3.0.0" last modified 2005/02/26 built May 3 2005 on stout.pmel.noaa.gov by kobrien
  Copyright (C) 1995--2005 Charlie Zender
  ncrcat version 3.0.0
  
  > ncrcat -D 5 -c -O -F -d time,1,200 -d lon,240,360 -d lat,120,180 -v thetao http://data1.gfdl.noaa.gov:8080/thredds/dodsC/ipcc_h2_ocean_tripolar_tracers ./out.nc
  
  I'll be happy to submit a bug report, if you'd still like. I signed up to be able to log in on the site, and haven't received the password yet...
  
  Let me know if there is anything else I can include which would be helpful....
  
  Kevin
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2005-05-04
    
    Hi Kevin,
    
    I can reproduce the problem you describe with the development
    NCO running the DODS-enabled command you supplied.
    However, the source of the memory growth is not readily
    apparent. Please put this file somewhere where I can grab it.
    
    http://data1.gfdl.noaa.gov:8080/thredds/dodsC/ipcc_h2_ocean_tripolar_tracers
    
    For some reason, grabbing it with NCO fails right now
    and I need a copy I can test locally to see if this is DODS-related or not.
    
    Thanks,
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nobody/Anonymous - 2005-05-05
  
  Hi Charlie -
  
  You can get to similar, individual, netcdf files at:
  
  ftp://data1.gfdl.noaa.gov/gfdl_cm2_1/CM2.1U_Control-1860_D4/pp/ocean_tripolar/ts/monthly/
  
  The http refence in my earlier message is a DODS aggregation of such files. The files are on the order of 1.7Gbytes each.
  
  Let me know if there's anything else I can help with.
  
  Thanks -
  
  Kevin
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2005-05-05
    
    Hi Kevin,
    
    I did a test with two different large files, yours and our benchmark
    file ipcc_dly_T85.nc. Your file was on your DODS server and ours
    is on ours. Using ncrcat as you did to access large numbers of
    records clearly shows a memory leak on the local NCO client
    which ultimately causes ncrcat to fail.
    
    Then I tried the ncrcat command on the same file accessed locally.
    There was no leak, and the task completed as expected.
    This accords with valgrind, which shows no leaks in ncrcat.
    Hence all signs point to a leak inside the OPeNDAP netcdf
    client library, or in some aspect of using OPeNDAP which
    I do not understand.
    
    We have barely been able to run valgrind on the DODS-enabled
    NCO programs. I think it requires some patches from valgrind
    developers. Perhaps Harry will comment and post the patches.
    In any case, I am curious to see where the leak arises as it
    prevents NCO from completing large dataset manipulations across
    OPeNDAP, which is exactly what we're trying to improve.
    
    The upshot is there may be no short-term solution other than
    running ncrcat locally.
    
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nobody/Anonymous - 2005-05-05
  
  Hi Charlie -
  
  Sounds good. I saw Harry's message to the DODS tech mailing list. I guess I'll wait and see what comes out of that.
  
  Also, valgrind looks like a useful too, so thanks for your pointer to that!
  
  Thanks for your testing and help.
  
  Kevin
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Harry Mangalam - 2005-10-07
  
  Hi Kevin,
  Charlie reminded me of this today - I thought I had responded to you but I searched my local mail (I'm away from my main mail machine) and I couldn't find anything, so let me apologize for the delay if I haven't communicated the update to you.
  
  The upshot is that the DODs memory leak was fixed with the recent release (certainly is fixed in the current release, so there should be no more memory leaks of the type that spawned this, so you should be safe in trying to do a DODS-enabled retrieval.
  
  Harry
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Using ncrcat to subset large files

Command-line operators for netCDF and HDF files

Forums

Help

Using ncrcat to subset large files

Using ncrcat to subset large files

Command-line operators for netCDF and HDF files

Forums

Help

Using ncrcat to subset large files document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Using ncrcat to subset large files