Menu

Using ncrcat to subset large files

Help
2005-04-29
2013-10-17
  • Nobody/Anonymous

    Hello -

    We are trying to perform subsetting procedures on very large files using ncrcat.  These files are aggregated together using the Unidata aggregation server and so are many gigabytes in size.  What we are trying to accomplish is to allow users to pull subsets out of the aggregated datasets.  For small requests, it works fine.  But, as stated in the NCO manual, large requests exhaust the machines memory and the process crashes.

    Ideally, we'd be able to write, say, 100 timesteps at a time to a file and just append to this same file until the full hyperslab is written out.  This would prevent the memory problems from occuring.

    Is there a recommended way to do this? 

    For example, I know this will work:
        ncrcat -d time 1,100 -v theta my_data.nc out1.nc
        ncrcat -d time 101,200 -v theta my_data.nc out2.nc
        ncrcat -d time 201,300 -v theta my_data.nc out3.nc
        ncrcat out[123].nc out_all.nc

    But it'd be nicer to do this:
        ncrcat -d time 1,100 -v theta my_data.nc out_all.nc
        ncrcat -A -d time 101,200 -v theta my_data.nc out_all.nc
        ncrcat -A -d time 201,300 -v theta my_data.nc out_all.nc

    Or is there a better way to achieve the goal of subsetting?

    Thanks

    Kevin

     
    • Charlie Zender

      Charlie Zender - 2005-05-03

      Hi,

      I am glad you are trying to use NCO to hyperslabe very large files
      because this is exactly what we are trying to optimize it for and it's
      always helpful to hear real life torture tests of your software.
      Your report perplexes me because NCO should definitely NOT
      run out of memory on an ncrcat job. It shoud use the same
      sustained memory to ncrcat two timesteps as two million.
      The following should work fine:

      ncrcat -d time 1,300 -v theta my_data.nc out_all.nc

      i.e., ask for the full hyperslab at one time.

      If this does lead to a problem on NCO 3.0.0 then let us know
      and we will try to fix it. Follow the error reporting procedures
      at http://nco.sf.net/nco.html#bugs

      > But, as stated in the NCO manual, large requests exhaust the > machines memory and the process crashes.

      Send the requisite error data for us to understand this.
      Better yet, post the my_data.nc file where we can get it to try
      to reproduce the problem.

      Thanks,
      Charlie

       
    • Nobody/Anonymous

      Hi Charlie -

      What I am seeing is that as the ncrcat process subsets, it's memory size continues to grow as if memory isn't being freed after writing out a time step.  In fact, the memory of the process grows roughly equivalent to the size of the .tmp file.

      Here's an example that you should be able to try with a dods-enabled ncrcat:

      > ncrcat -r
      NCO netCDF Operators version "3.0.0" last modified 2005/02/26 built May  3 2005 on stout.pmel.noaa.gov by kobrien
      Copyright (C) 1995--2005 Charlie Zender
      ncrcat version 3.0.0

      > ncrcat -D 5 -c -O -F -d time,1,200 -d lon,240,360 -d lat,120,180 -v thetao http://data1.gfdl.noaa.gov:8080/thredds/dodsC/ipcc_h2_ocean_tripolar_tracers ./out.nc

      I'll be happy to submit a bug report, if you'd still like.  I signed up to be able to log in on the site, and haven't received the password yet...

      Let me know if there is anything else I can include which would be helpful....

      Kevin

       
      • Charlie Zender

        Charlie Zender - 2005-05-04

        Hi Kevin,

        I can reproduce the problem you describe with the development
        NCO running the DODS-enabled command you supplied.
        However, the source of the memory growth is not readily
        apparent. Please put this file somewhere where I can grab it.

        http://data1.gfdl.noaa.gov:8080/thredds/dodsC/ipcc_h2_ocean_tripolar_tracers

        For some reason, grabbing it with NCO fails right now
        and I need a copy I can test locally to see if  this is DODS-related or not.

        Thanks,
        Charlie

         
    • Nobody/Anonymous

      Hi Charlie -

      You can get to similar, individual, netcdf files at:

      ftp://data1.gfdl.noaa.gov/gfdl_cm2_1/CM2.1U_Control-1860_D4/pp/ocean_tripolar/ts/monthly/

      The http refence in my earlier message is a DODS aggregation of such files.  The files are on the order of 1.7Gbytes each.

      Let me know if there's anything else I can help with.

      Thanks -

      Kevin

       
      • Charlie Zender

        Charlie Zender - 2005-05-05

        Hi Kevin,

        I did a test with two different large files, yours and our benchmark
        file ipcc_dly_T85.nc. Your file was on your DODS server and ours
        is on ours. Using ncrcat as you did to access large numbers of
        records clearly shows a memory leak on the local NCO client
        which ultimately causes ncrcat to fail.

        Then I tried the ncrcat command on the same file accessed locally.
        There was no leak, and the task completed as expected.
        This accords with valgrind, which shows no leaks in ncrcat.
        Hence all signs point to a leak inside the OPeNDAP netcdf
        client library, or in some aspect of using OPeNDAP which
        I do not understand.

        We have barely been able to run valgrind on the DODS-enabled
        NCO programs. I think it requires some patches from valgrind
        developers. Perhaps Harry will comment and post the patches.
        In any case, I am curious to see where the leak arises as it
        prevents NCO from completing large dataset manipulations across
        OPeNDAP, which is exactly what we're trying to improve.

        The upshot is there may be no short-term solution other than
        running ncrcat locally.

        Charlie

         
    • Nobody/Anonymous

      Hi Charlie -

      Sounds good.  I saw Harry's message to the DODS tech mailing list.  I guess I'll wait and see what comes out of that.

      Also, valgrind looks like a useful too, so thanks for your pointer to that!

      Thanks for your testing and help.

      Kevin

       
    • Harry Mangalam

      Harry Mangalam - 2005-10-07

      Hi Kevin,
         Charlie reminded me of this today - I thought I had responded to you but I searched my local mail (I'm away from my main mail machine) and I couldn't find anything, so let me apologize for the delay if I haven't communicated the update to you.

      The upshot is that the DODs memory leak was fixed with the recent release (certainly is fixed in the current release, so there should be no more memory leaks of the type that spawned this, so you should be safe in trying to do a DODS-enabled retrieval.

      Harry

       

Log in to post a comment.

MongoDB Logo MongoDB