We are trying to perform subsetting procedures on very large files using ncrcat. These files are aggregated together using the Unidata aggregation server and so are many gigabytes in size. What we are trying to accomplish is to allow users to pull subsets out of the aggregated datasets. For small requests, it works fine. But, as stated in the NCO manual, large requests exhaust the machines memory and the process crashes.
Ideally, we'd be able to write, say, 100 timesteps at a time to a file and just append to this same file until the full hyperslab is written out. This would prevent the memory problems from occuring.
Is there a recommended way to do this?
For example, I know this will work:
ncrcat -d time 1,100 -v theta my_data.nc out1.nc
ncrcat -d time 101,200 -v theta my_data.nc out2.nc
ncrcat -d time 201,300 -v theta my_data.nc out3.nc
ncrcat out[123].nc out_all.nc
But it'd be nicer to do this:
ncrcat -d time 1,100 -v theta my_data.nc out_all.nc
ncrcat -A -d time 101,200 -v theta my_data.nc out_all.nc
ncrcat -A -d time 201,300 -v theta my_data.nc out_all.nc
Or is there a better way to achieve the goal of subsetting?
Thanks
Kevin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am glad you are trying to use NCO to hyperslabe very large files
because this is exactly what we are trying to optimize it for and it's
always helpful to hear real life torture tests of your software.
Your report perplexes me because NCO should definitely NOT
run out of memory on an ncrcat job. It shoud use the same
sustained memory to ncrcat two timesteps as two million.
The following should work fine:
ncrcat -d time 1,300 -v theta my_data.nc out_all.nc
i.e., ask for the full hyperslab at one time.
If this does lead to a problem on NCO 3.0.0 then let us know
and we will try to fix it. Follow the error reporting procedures
at http://nco.sf.net/nco.html#bugs
> But, as stated in the NCO manual, large requests exhaust the > machines memory and the process crashes.
Send the requisite error data for us to understand this.
Better yet, post the my_data.nc file where we can get it to try
to reproduce the problem.
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What I am seeing is that as the ncrcat process subsets, it's memory size continues to grow as if memory isn't being freed after writing out a time step. In fact, the memory of the process grows roughly equivalent to the size of the .tmp file.
Here's an example that you should be able to try with a dods-enabled ncrcat:
> ncrcat -r
NCO netCDF Operators version "3.0.0" last modified 2005/02/26 built May 3 2005 on stout.pmel.noaa.gov by kobrien
Copyright (C) 1995--2005 Charlie Zender
ncrcat version 3.0.0
I can reproduce the problem you describe with the development
NCO running the DODS-enabled command you supplied.
However, the source of the memory growth is not readily
apparent. Please put this file somewhere where I can grab it.
I did a test with two different large files, yours and our benchmark
file ipcc_dly_T85.nc. Your file was on your DODS server and ours
is on ours. Using ncrcat as you did to access large numbers of
records clearly shows a memory leak on the local NCO client
which ultimately causes ncrcat to fail.
Then I tried the ncrcat command on the same file accessed locally.
There was no leak, and the task completed as expected.
This accords with valgrind, which shows no leaks in ncrcat.
Hence all signs point to a leak inside the OPeNDAP netcdf
client library, or in some aspect of using OPeNDAP which
I do not understand.
We have barely been able to run valgrind on the DODS-enabled
NCO programs. I think it requires some patches from valgrind
developers. Perhaps Harry will comment and post the patches.
In any case, I am curious to see where the leak arises as it
prevents NCO from completing large dataset manipulations across
OPeNDAP, which is exactly what we're trying to improve.
The upshot is there may be no short-term solution other than
running ncrcat locally.
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Kevin,
Charlie reminded me of this today - I thought I had responded to you but I searched my local mail (I'm away from my main mail machine) and I couldn't find anything, so let me apologize for the delay if I haven't communicated the update to you.
The upshot is that the DODs memory leak was fixed with the recent release (certainly is fixed in the current release, so there should be no more memory leaks of the type that spawned this, so you should be safe in trying to do a DODS-enabled retrieval.
Harry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello -
We are trying to perform subsetting procedures on very large files using ncrcat. These files are aggregated together using the Unidata aggregation server and so are many gigabytes in size. What we are trying to accomplish is to allow users to pull subsets out of the aggregated datasets. For small requests, it works fine. But, as stated in the NCO manual, large requests exhaust the machines memory and the process crashes.
Ideally, we'd be able to write, say, 100 timesteps at a time to a file and just append to this same file until the full hyperslab is written out. This would prevent the memory problems from occuring.
Is there a recommended way to do this?
For example, I know this will work:
ncrcat -d time 1,100 -v theta my_data.nc out1.nc
ncrcat -d time 101,200 -v theta my_data.nc out2.nc
ncrcat -d time 201,300 -v theta my_data.nc out3.nc
ncrcat out[123].nc out_all.nc
But it'd be nicer to do this:
ncrcat -d time 1,100 -v theta my_data.nc out_all.nc
ncrcat -A -d time 101,200 -v theta my_data.nc out_all.nc
ncrcat -A -d time 201,300 -v theta my_data.nc out_all.nc
Or is there a better way to achieve the goal of subsetting?
Thanks
Kevin
Hi,
I am glad you are trying to use NCO to hyperslabe very large files
because this is exactly what we are trying to optimize it for and it's
always helpful to hear real life torture tests of your software.
Your report perplexes me because NCO should definitely NOT
run out of memory on an ncrcat job. It shoud use the same
sustained memory to ncrcat two timesteps as two million.
The following should work fine:
ncrcat -d time 1,300 -v theta my_data.nc out_all.nc
i.e., ask for the full hyperslab at one time.
If this does lead to a problem on NCO 3.0.0 then let us know
and we will try to fix it. Follow the error reporting procedures
at http://nco.sf.net/nco.html#bugs
> But, as stated in the NCO manual, large requests exhaust the > machines memory and the process crashes.
Send the requisite error data for us to understand this.
Better yet, post the my_data.nc file where we can get it to try
to reproduce the problem.
Thanks,
Charlie
Hi Charlie -
What I am seeing is that as the ncrcat process subsets, it's memory size continues to grow as if memory isn't being freed after writing out a time step. In fact, the memory of the process grows roughly equivalent to the size of the .tmp file.
Here's an example that you should be able to try with a dods-enabled ncrcat:
> ncrcat -r
NCO netCDF Operators version "3.0.0" last modified 2005/02/26 built May 3 2005 on stout.pmel.noaa.gov by kobrien
Copyright (C) 1995--2005 Charlie Zender
ncrcat version 3.0.0
> ncrcat -D 5 -c -O -F -d time,1,200 -d lon,240,360 -d lat,120,180 -v thetao http://data1.gfdl.noaa.gov:8080/thredds/dodsC/ipcc_h2_ocean_tripolar_tracers ./out.nc
I'll be happy to submit a bug report, if you'd still like. I signed up to be able to log in on the site, and haven't received the password yet...
Let me know if there is anything else I can include which would be helpful....
Kevin
Hi Kevin,
I can reproduce the problem you describe with the development
NCO running the DODS-enabled command you supplied.
However, the source of the memory growth is not readily
apparent. Please put this file somewhere where I can grab it.
http://data1.gfdl.noaa.gov:8080/thredds/dodsC/ipcc_h2_ocean_tripolar_tracers
For some reason, grabbing it with NCO fails right now
and I need a copy I can test locally to see if this is DODS-related or not.
Thanks,
Charlie
Hi Charlie -
You can get to similar, individual, netcdf files at:
ftp://data1.gfdl.noaa.gov/gfdl_cm2_1/CM2.1U_Control-1860_D4/pp/ocean_tripolar/ts/monthly/
The http refence in my earlier message is a DODS aggregation of such files. The files are on the order of 1.7Gbytes each.
Let me know if there's anything else I can help with.
Thanks -
Kevin
Hi Kevin,
I did a test with two different large files, yours and our benchmark
file ipcc_dly_T85.nc. Your file was on your DODS server and ours
is on ours. Using ncrcat as you did to access large numbers of
records clearly shows a memory leak on the local NCO client
which ultimately causes ncrcat to fail.
Then I tried the ncrcat command on the same file accessed locally.
There was no leak, and the task completed as expected.
This accords with valgrind, which shows no leaks in ncrcat.
Hence all signs point to a leak inside the OPeNDAP netcdf
client library, or in some aspect of using OPeNDAP which
I do not understand.
We have barely been able to run valgrind on the DODS-enabled
NCO programs. I think it requires some patches from valgrind
developers. Perhaps Harry will comment and post the patches.
In any case, I am curious to see where the leak arises as it
prevents NCO from completing large dataset manipulations across
OPeNDAP, which is exactly what we're trying to improve.
The upshot is there may be no short-term solution other than
running ncrcat locally.
Charlie
Hi Charlie -
Sounds good. I saw Harry's message to the DODS tech mailing list. I guess I'll wait and see what comes out of that.
Also, valgrind looks like a useful too, so thanks for your pointer to that!
Thanks for your testing and help.
Kevin
Hi Kevin,
Charlie reminded me of this today - I thought I had responded to you but I searched my local mail (I'm away from my main mail machine) and I couldn't find anything, so let me apologize for the delay if I haven't communicated the update to you.
The upshot is that the DODs memory leak was fixed with the recent release (certainly is fixed in the current release, so there should be no more memory leaks of the type that spawned this, so you should be safe in trying to do a DODS-enabled retrieval.
Harry