Menu

#15 chunksize, contiguous and nowrite option patch

closed-out-of-date
new feature (7)
5
2012-07-10
2009-04-09
No

Three new functionality were added to ncks.

-T, --chunksize \tChunksize fore each dimension [t,z,y,x]
-N, \t Do not perform any output operation (read input file only)
-f, --fixed, \tforce output file with fixed dimensions (NO UNLIMITED)

Discussion

  • Denis Nadeau

    Denis Nadeau - 2009-04-09

    chunksize/read-only/fixed dimensions

     
  • Charlie Zender

    Charlie Zender - 2009-04-14

    Denis,

    Thanks for your ncks patch.
    I plan on looking at it in detail soon.

    This arrived "out of the blue".
    I think I understand the purpose/functionality of the -N and -f options.
    Can you tell me the rationale/benefits/performance for the chunksize changes?

    Thanks,
    Charlie

     
  • Charlie Zender

    Charlie Zender - 2009-04-14
    • assigned_to: nobody --> zender
     
  • Denis Nadeau

    Denis Nadeau - 2009-04-14

    Charlie,

    Chunksize allows to "tile" each variable in multiple chunks within a netcdf4 file format. If you request an a small portion of a variable i.e. [1,1,10,10], only the "chunks" containing that portion will be read and looked at. I am working on 1.8 Gig files and need to benchmark the chunksize for optimal read/write performance.

    In my case it seems that fixing the 3rd dimension to [1] and chunking xsize/2 and ysize/2 is optimal for writing. Reading seems to be different. (I used ncks with -N to benchmark the reading.)

    I thought it might be useful for other groups. We are using ncks quite a lot now with netcdf4.

    Denis

     
  • Nobody/Anonymous

    Hi Denis,

    Overall your patches seem reasonable and I intend to include
    them upstream. Especially since they (according to you) speed-up
    large file access/copying, which is where we want NCO to shine.
    Questions:

    1. I don't understand the purpose of EXCLUDE_OUTPUT_WRITE.
    It seems to be mainly used in the form

    if(!EXCLUDE_OUTPUT_WRITE) /* Do not perform output operation */

    to prevent calling routines that require/assume the existence of an
    output file. But such routines should never be called unless an output
    file exists. All the uses that I see are already inside the

    if(fl_out)

    branch of ncks. Meaning an output file is intended to be written.
    Please explain. What am I missing? Could the patch be re-written
    without EXCLUDE_OUTPUT_WRITE?

    2. The logic for -T handling seems flawed because fl_out_fmt is
    not necessarily known until later.

    3. nc_def_var_chunking() calls, ultimately the point of the chunking
    patch, must truly work on netCDF4-format files and be no-ops
    otherwise, i.e., follow the logic used for nc_def_var_deflate().
    I just committed the wrapper and stub functions required for this in
    nco_netcdf.[ch].

    4. Do (you know if) the chunk size routines affect operators
    besides ncks and thus will require providing/initializing the
    chunk size options in the other operators?

    5. Does your patch pass the regression tests?

    cd ~/nco/bm;./nco_bm.pl --regress

    and

    ncap2 -O -v -S ~/nco/data/ncap2_tst.nco ~/nco/data/in.nc ~/foo.nc

    6. How would you like to proceed? Please respond and send all followups to the
    Developer's forum so others, especially Henry, learn of these plans.

    Thanks,
    Charlie

     
  • Denis Nadeau

    Denis Nadeau - 2009-04-29

    Charlie,

    1. I set EXCLUDE_OUTPUT_WRITE to prevent calling output routine as you mentioned.
    I used this option to test the a "quiet mode" like in "cvs" or "make" I can benchmark the reading performance with this option.

    2. You are right, it is flawed. In my test I assumed that -T will happen after -3 or -4. I should set up a boolean variable to True and put this code outside the while loop.

    3. You are correct! Thanks for the patch.

    4. Chunkside will not affect any reading, it is set for each variable the same was deflate is set.
    Operator creating new variables could set chunksize. The logic is the same as deflate...

    5. I tried the regression test and got 2 failures. I did not have ncap2, and used ncap don't know why, I might have a older version. I did get some failures but I am not sure why...

    Test Results Seconds to complete
    -------------------------- ----------------------------------------
    Test Success Failure Total WallClock Real User System Diff
    ncap2: 8 2 10 0.67 0.00 0.00 0.00 0.00
    ncatted: 5 5 0.31 0.00 0.00 0.00 0.00
    ncbo: 11 2 13 1.09 0.00 0.00 0.00 0.00
    ncflint: 3 3 0.35 0.00 0.00 0.00 0.00
    ncea: 6 6 0.42 0.00 0.00 0.00 0.00
    ncecat: 1 1 2 0.21 0.00 0.00 0.00 0.00
    ncks: 15 2 17 0.68 0.00 0.00 0.00 0.00
    ncpdq: 9 9 0.49 0.00 0.00 0.00 0.00
    ncra: 20 20 0.91 0.00 0.00 0.00 0.00
    ncrcat: 4 3 7 0.48 0.00 0.00 0.00 0.00
    ncwa: 37 1 38 1.61 0.00 0.00 0.00 0.00
    net: 0 4 4 0.02 0.00 0.00 0.00 0.00

    src/nco> ncap -O -v -S /gpfsm/dnb20/dnadeau/basedir/Baselibs/src/nco/data/ncap2_tst.nco /gpfsm/dnb20/dnadeau/basedir/Baselibs/src/nco/data/in.nc ~/foo.nc
    ncap: variable nbr_err defined
    ncap: variable nbr_err defined
    ncap: variable nbr_err defined
    ncap: variable nbr_err defined
    ncap: variable nbr_err defined
    ncap: variable nbr_err defined
    WARNING unable to find hmask in /gpfsm/dnb20/dnadeau/basedir/Baselibs/src/nco/data/in.nc or /home/dnadeau/foo.nc
    Segmentation fault

    6. I think you can get rid of the -N option since this was only used to do reading tests.
    If you change the Chunksize logic and set up the fixed dimensions that would be great.

     
  • Charlie Zender

    Charlie Zender - 2011-06-26

    denis,
    i am trying to tie up loose ends like this patch request. did i upstream the parts of this patch that we discussed? can i close this request?
    thanks,
    charlie

     
  • Charlie Zender

    Charlie Zender - 2012-07-10

    most of this seems to have been satisfactorily addressed

     
  • Charlie Zender

    Charlie Zender - 2012-07-10
    • status: open --> closed-out-of-date
     

Log in to post a comment.

MongoDB Logo MongoDB