-T, --chunksize \tChunksize fore each dimension [t,z,y,x]
-N, \t Do not perform any output operation (read input file only)
-f, --fixed, \tforce output file with fixed dimensions (NO UNLIMITED)
Thanks for your ncks patch.
I plan on looking at it in detail soon.
This arrived "out of the blue".
I think I understand the purpose/functionality of the -N and -f options.
Can you tell me the rationale/benefits/performance for the chunksize changes?
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Chunksize allows to "tile" each variable in multiple chunks within a netcdf4 file format. If you request an a small portion of a variable i.e. [1,1,10,10], only the "chunks" containing that portion will be read and looked at. I am working on 1.8 Gig files and need to benchmark the chunksize for optimal read/write performance.
In my case it seems that fixing the 3rd dimension to [1] and chunking xsize/2 and ysize/2 is optimal for writing. Reading seems to be different. (I used ncks with -N to benchmark the reading.)
I thought it might be useful for other groups. We are using ncks quite a lot now with netcdf4.
Denis
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Overall your patches seem reasonable and I intend to include
them upstream. Especially since they (according to you) speed-up
large file access/copying, which is where we want NCO to shine.
Questions:
1. I don't understand the purpose of EXCLUDE_OUTPUT_WRITE.
It seems to be mainly used in the form
if(!EXCLUDE_OUTPUT_WRITE) /* Do not perform output operation */
to prevent calling routines that require/assume the existence of an
output file. But such routines should never be called unless an output
file exists. All the uses that I see are already inside the
if(fl_out)
branch of ncks. Meaning an output file is intended to be written.
Please explain. What am I missing? Could the patch be re-written
without EXCLUDE_OUTPUT_WRITE?
2. The logic for -T handling seems flawed because fl_out_fmt is
not necessarily known until later.
3. nc_def_var_chunking() calls, ultimately the point of the chunking
patch, must truly work on netCDF4-format files and be no-ops
otherwise, i.e., follow the logic used for nc_def_var_deflate().
I just committed the wrapper and stub functions required for this in
nco_netcdf.[ch].
4. Do (you know if) the chunk size routines affect operators
besides ncks and thus will require providing/initializing the
chunk size options in the other operators?
1. I set EXCLUDE_OUTPUT_WRITE to prevent calling output routine as you mentioned.
I used this option to test the a "quiet mode" like in "cvs" or "make" I can benchmark the reading performance with this option.
2. You are right, it is flawed. In my test I assumed that -T will happen after -3 or -4. I should set up a boolean variable to True and put this code outside the while loop.
3. You are correct! Thanks for the patch.
4. Chunkside will not affect any reading, it is set for each variable the same was deflate is set.
Operator creating new variables could set chunksize. The logic is the same as deflate...
5. I tried the regression test and got 2 failures. I did not have ncap2, and used ncap don't know why, I might have a older version. I did get some failures but I am not sure why...
src/nco> ncap -O -v -S /gpfsm/dnb20/dnadeau/basedir/Baselibs/src/nco/data/ncap2_tst.nco /gpfsm/dnb20/dnadeau/basedir/Baselibs/src/nco/data/in.nc ~/foo.nc
ncap: variable nbr_err defined
ncap: variable nbr_err defined
ncap: variable nbr_err defined
ncap: variable nbr_err defined
ncap: variable nbr_err defined
ncap: variable nbr_err defined
WARNING unable to find hmask in /gpfsm/dnb20/dnadeau/basedir/Baselibs/src/nco/data/in.nc or /home/dnadeau/foo.nc
Segmentation fault
6. I think you can get rid of the -N option since this was only used to do reading tests.
If you change the Chunksize logic and set up the fixed dimensions that would be great.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
denis,
i am trying to tie up loose ends like this patch request. did i upstream the parts of this patch that we discussed? can i close this request?
thanks,
charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
chunksize/read-only/fixed dimensions
Denis,
Thanks for your ncks patch.
I plan on looking at it in detail soon.
This arrived "out of the blue".
I think I understand the purpose/functionality of the -N and -f options.
Can you tell me the rationale/benefits/performance for the chunksize changes?
Thanks,
Charlie
Charlie,
Chunksize allows to "tile" each variable in multiple chunks within a netcdf4 file format. If you request an a small portion of a variable i.e. [1,1,10,10], only the "chunks" containing that portion will be read and looked at. I am working on 1.8 Gig files and need to benchmark the chunksize for optimal read/write performance.
In my case it seems that fixing the 3rd dimension to [1] and chunking xsize/2 and ysize/2 is optimal for writing. Reading seems to be different. (I used ncks with -N to benchmark the reading.)
I thought it might be useful for other groups. We are using ncks quite a lot now with netcdf4.
Denis
Hi Denis,
Overall your patches seem reasonable and I intend to include
them upstream. Especially since they (according to you) speed-up
large file access/copying, which is where we want NCO to shine.
Questions:
1. I don't understand the purpose of EXCLUDE_OUTPUT_WRITE.
It seems to be mainly used in the form
if(!EXCLUDE_OUTPUT_WRITE) /* Do not perform output operation */
to prevent calling routines that require/assume the existence of an
output file. But such routines should never be called unless an output
file exists. All the uses that I see are already inside the
if(fl_out)
branch of ncks. Meaning an output file is intended to be written.
Please explain. What am I missing? Could the patch be re-written
without EXCLUDE_OUTPUT_WRITE?
2. The logic for -T handling seems flawed because fl_out_fmt is
not necessarily known until later.
3. nc_def_var_chunking() calls, ultimately the point of the chunking
patch, must truly work on netCDF4-format files and be no-ops
otherwise, i.e., follow the logic used for nc_def_var_deflate().
I just committed the wrapper and stub functions required for this in
nco_netcdf.[ch].
4. Do (you know if) the chunk size routines affect operators
besides ncks and thus will require providing/initializing the
chunk size options in the other operators?
5. Does your patch pass the regression tests?
cd ~/nco/bm;./nco_bm.pl --regress
and
ncap2 -O -v -S ~/nco/data/ncap2_tst.nco ~/nco/data/in.nc ~/foo.nc
6. How would you like to proceed? Please respond and send all followups to the
Developer's forum so others, especially Henry, learn of these plans.
Thanks,
Charlie
Charlie,
1. I set EXCLUDE_OUTPUT_WRITE to prevent calling output routine as you mentioned.
I used this option to test the a "quiet mode" like in "cvs" or "make" I can benchmark the reading performance with this option.
2. You are right, it is flawed. In my test I assumed that -T will happen after -3 or -4. I should set up a boolean variable to True and put this code outside the while loop.
3. You are correct! Thanks for the patch.
4. Chunkside will not affect any reading, it is set for each variable the same was deflate is set.
Operator creating new variables could set chunksize. The logic is the same as deflate...
5. I tried the regression test and got 2 failures. I did not have ncap2, and used ncap don't know why, I might have a older version. I did get some failures but I am not sure why...
Test Results Seconds to complete
-------------------------- ----------------------------------------
Test Success Failure Total WallClock Real User System Diff
ncap2: 8 2 10 0.67 0.00 0.00 0.00 0.00
ncatted: 5 5 0.31 0.00 0.00 0.00 0.00
ncbo: 11 2 13 1.09 0.00 0.00 0.00 0.00
ncflint: 3 3 0.35 0.00 0.00 0.00 0.00
ncea: 6 6 0.42 0.00 0.00 0.00 0.00
ncecat: 1 1 2 0.21 0.00 0.00 0.00 0.00
ncks: 15 2 17 0.68 0.00 0.00 0.00 0.00
ncpdq: 9 9 0.49 0.00 0.00 0.00 0.00
ncra: 20 20 0.91 0.00 0.00 0.00 0.00
ncrcat: 4 3 7 0.48 0.00 0.00 0.00 0.00
ncwa: 37 1 38 1.61 0.00 0.00 0.00 0.00
net: 0 4 4 0.02 0.00 0.00 0.00 0.00
src/nco> ncap -O -v -S /gpfsm/dnb20/dnadeau/basedir/Baselibs/src/nco/data/ncap2_tst.nco /gpfsm/dnb20/dnadeau/basedir/Baselibs/src/nco/data/in.nc ~/foo.nc
ncap: variable nbr_err defined
ncap: variable nbr_err defined
ncap: variable nbr_err defined
ncap: variable nbr_err defined
ncap: variable nbr_err defined
ncap: variable nbr_err defined
WARNING unable to find hmask in /gpfsm/dnb20/dnadeau/basedir/Baselibs/src/nco/data/in.nc or /home/dnadeau/foo.nc
Segmentation fault
6. I think you can get rid of the -N option since this was only used to do reading tests.
If you change the Chunksize logic and set up the fixed dimensions that would be great.
denis,
i am trying to tie up loose ends like this patch request. did i upstream the parts of this patch that we discussed? can i close this request?
thanks,
charlie
most of this seems to have been satisfactorily addressed