NCO netCDF Operators / Discussion / Developers: Multislabbing discussions archive

Charlie Zender - 2002-12-28

For those interested in the new "multislabbing" feature, here
is a thread where I hope to archive the offline discussions
between Henry and myself that led up to this.

I did not save all the messages, but I just found one that
will be useful reference material, and will post that first.
Perhaps Henry will post what he has. Note that multi-slabbing,
loosely defined, is simply a set of hyperslabbing operations.
By collecting the operations into a set, rather than performing
them sequentially, advantages in speed can sometimes be gained. Next post will discuss two multislabbing algorithms.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Charlie Zender - 2002-12-28
  
  Sometime about November 16, 2002, I sent Henry this:
  
  Hi Henry,
  
  I was trying to figure out your new algorithm and I couldn't
  so I wrote down my algorithm below. After doing that then I
  figured out what you were saying. So here first is my algorithm,
  followed by some comparisons between mine and yours.
  
  The following multislabs are legal:
  
  #1. -d lon,0,360,10 -d lon,1,360
  #2. -d lon,0,360,10 -d lon,1,360 -d lat,0,180,10 -d lat,90,180,5
  
  In #1 and #2, the indices grabbed should be the union of the indices
  specified in the individual -d commands.
  It is important to know in advance the size of the full extracted
  multi-slab so that memory can be allocated and it can be written
  into memory as it is read.
  The method I proposed for reading in a previous email does not rely
  on any complicated algorithms for obtaining indices of elements.
  The whole procedure is doable in a simple loop over the existing
  retrieval algorithm.
  What is needed is a new algorithm that maps the hyperslabs into
  the continuous buffer from which they will be output.
  Here is the generic request to get a multislab of an N-dimensional
  variable, which is to be hyperslabbed along two of its N dimensions.
  
  ncks -d dim1,lmt11 ... -d dim1,lmt12 -d dim2,lmt21 -ddim2,lmt22 in.nc out.nc
  
  The read loop reads 4 slabs into memory where the 4 slabs are
  
  [lmt11,lmt21] = slab1
  [lmt11,lmt22] = slab2
  [lmt12,lmt21] = slab3
  [lmt12,lmt22] = slab4
  
  Existing infracture can be used for this. The new algorithm
  must decide how to map these 4 slabs into the final, contiguous
  buffer which will be used for output.
  The contiguous output buffer is
  sizeof(union(slab1,slab2,slab3,slab4)).
  Thus the algorithm must loop over each of slabs 1--4 and, for
  each element of the slab, determine what its location in the
  final output buffer will be.
  The location in the final output buffer is uniquely specifiable
  as an N element vector.
  Computing the values of each element of this vector is the tricky part.
  It will involve creating map vectors as is already done in
  var_avg and in prn_var_val_lmt.
  In base-10 counting map vectors are the digits offset, e.g.,
  [1,10,100,...] and a number is specified as a list of coefficients
  times the digits offset, so that 737 = 7*100+3*10+1*1.
  The output hyperslabs needs a similar base, [C1,C2,C3,...CN] where
  C1=count{dim1},C2=count{dim2},....
  Once the base [C1,C2,...CN] is known the problem reduces to determining
  the coefficients of each base element [i.e., 7,3,7].
  These coefficients are what I call the map vector, [M1,M2,...MN].
  The M_o (where _o denotes the index in the output file) can be
  determined for each element by counting the position of the current
  dim_i (where _i denotes the index in the input file) in the output slab.
  Thus if the multislab specifies that K elements of dim_n should
  be extracted from the input file, then 0 <= M_o < K.
  These K elements are indexed M_i1,M_i2,M_i3,...M_ik,...M_iK in the
  input file.
  Each output index M_ok uniquely corresponds to an input index M_ik,
  and, in fact, M_ok=k (whereas M_ik is the k'th index of that dimension
  specified by the multi-hyperslab).
  
  For efficiency the algorithm can do this after each slab is read,
  map that slab into the output buffer, then free the memory for the
  slab before the next slab is read. It does not matter if a subsequent
  slab is read and some of its elements overlap (map directly onto)
  elements of a previous slab already stored in the output buffer.
  After all input slabs have been read and mapped to the output
  buffer, the output buffer will be completely full and ready to
  write to the output file (or to screen).
  If it's being written to an output file, it can be written with
  a single write command, since its dimensions and count vectors are
  known.
  
  I've read your algorithm twice and it will give the same results
  as the algorithm that I just described.
  The basic difference is that your algorithm ensures that each
  value is read from disk at most one time, whereas, if the
  user specifies overlapping hyperslabs, my algorithm will retrieve
  the same value from disk as many times as the user specifies.
  I understand the bookkeeping required for my algorithm better
  than I do the bookkeeping required for your algorithm.
  Probably the algorithm that requires the fewest disk reads
  is superior in a time-of-execution sense.
  It is clear that my algorithm requires as many reads as the
  product of the number of hyperslabs specified for each dimension,
  e.g., 1*1*2*1*2=4 reads for the above example.
  
  It is possible that your algorithm _could_ reduce the total number
  of reads if the hyperslabs specified by the user contain a
  completely redundant specification, e.g., -d dim1,1,2 -d dim1,1,1
  would be reduced to one read by your algorithm but would be
  two reads with my algorithm.
  I think the chances of a user actually specifying a redundant
  hyperslab are very small.
  In any case, we should weigh the difficulty of writing your
  nco_calc_indices() algorithm against the difficulty of writing a loop
  around the current ncks algorithm which simply uses the user-specified
  hyperslab limits (and already does the appropriate recomputations
  when a hyperslab is wrapped).
  
  > I'm not sure yet how to deal with wrapped hyperslabs, If they are on their
  > own then their treatment should be as is, If they are with other limits then
  > break them into two separate ranges ????
  
  That is what the current lmt_evl() imlementation does: What is
  specified as a single hypslab is automatically broken into two
  separate reads.
  
  I havn't really considered wrapped multi-slabs because they bend my
  brain. All I know is that the current infrastructure handles wrapping
  correctly, and I _think_ that my algorithm, which simply loops over
  the current infrastructure, might therefore behave correctly with
  wrapped hyperslabs but I'm not sure...
  
  > Have to make sure that nco_calc_indicies() can cope with repeated dims in a
  > var
  > e.g bigvar[ time,lat,lat,lon,lon] or other such stuff
  
  In all the time I've written scientific software for netCDF this
  situation has never arisen. Is this even allowed by the netCDF
  standard? There are good physical grounds for hoping that it never
  does arise, but I can see situations where an array dimensioned, e.g.,
  var[dim_lon,dim_lon] would be useful, e.g., for cross-correlations
  of two variables that are functions of longitude.
  I have a feeling ncwa would misbehave if it ever encountered such a
  variable, because it's algorithms are based on comparing names of
  dimensions in variables to names of dimensions in weights...
  In any case, both your algorithm and my algorithm could be altered
  to handle such situations, but I would not worry about such
  variables because they have never been encountered.
  
  > For printing what about keeping the existing routine and calling it with
  > each of the hyperslab args ?
  
  Yes, something like that.
  
  Would you please post the full archive of our private discussions
  of this subject and any future followups to NCO developers so we have
  an archive of this discussion that others can refer to?
  BTW Sourceforge appears to be down at the present moment.
  
  Thanks,
  Charlie
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2003-01-04
    
    Hi Henry,
    
    I was trying to figure out your new algorithm and I couldn't
    so I wrote down my algorithm below. After doing that then I
    figured out what you were saying. So here first is my algorithm,
    followed by some comparisons between mine and yours.
    
    The following multislabs are legal:
    
    #1. -d lon,0,360,10 -d lon,1,360
    #2. -d lon,0,360,10 -d lon,1,360 -d lat,0,180,10 -d lat,90,180,5
    
    In #1 and #2, the indices grabbed should be the union of the indices
    specified in the individual -d commands.
    It is important to know in advance the size of the full extracted
    multi-slab so that memory can be allocated and it can be written
    into memory as it is read.
    The method I proposed for reading in a previous email does not rely
    on any complicated algorithms for obtaining indices of elements.
    The whole procedure is doable in a simple loop over the existing
    retrieval algorithm.
    What is needed is a new algorithm that maps the hyperslabs into
    the continuous buffer from which they will be output.
    Here is the generic request to get a multislab of an N-dimensional
    variable, which is to be hyperslabbed along two of its N dimensions.
    
    ncks -d dim1,lmt11 ... -d dim1,lmt12 -d dim2,lmt21 -ddim2,lmt22 in.nc out.nc
    
    The read loop reads 4 slabs into memory where the 4 slabs are
    
    [lmt11,lmt21] = slab1
    [lmt11,lmt22] = slab2
    [lmt12,lmt21] = slab3
    [lmt12,lmt22] = slab4
    
    Existing infracture can be used for this. The new algorithm
    must decide how to map these 4 slabs into the final, contiguous
    buffer which will be used for output.
    The contiguous output buffer is
    sizeof(union(slab1,slab2,slab3,slab4)).
    Thus the algorithm must loop over each of slabs 1--4 and, for
    each element of the slab, determine what its location in the
    final output buffer will be.
    The location in the final output buffer is uniquely specifiable
    as an N element vector.
    Computing the values of each element of this vector is the tricky part.
    It will involve creating map vectors as is already done in
    var_avg and in prn_var_val_lmt.
    In base-10 counting map vectors are the digits offset, e.g.,
    [1,10,100,...] and a number is specified as a list of coefficients
    times the digits offset, so that 737 = 7*100+3*10+1*1.
    The output hyperslabs needs a similar base, [C1,C2,C3,...CN] where
    C1=count{dim1},C2=count{dim2},....
    Once the base [C1,C2,...CN] is known the problem reduces to determining
    the coefficients of each base element [i.e., 7,3,7].
    These coefficients are what I call the map vector, [M1,M2,...MN].
    The M_o (where _o denotes the index in the output file) can be
    determined for each element by counting the position of the current
    dim_i (where _i denotes the index in the input file) in the output slab.
    Thus if the multislab specifies that K elements of dim_n should
    be extracted from the input file, then 0 <= M_o < K.
    These K elements are indexed M_i1,M_i2,M_i3,...M_ik,...M_iK in the
    input file.
    Each output index M_ok uniquely corresponds to an input index M_ik,
    and, in fact, M_ok=k (whereas M_ik is the k'th index of that dimension
    specified by the multi-hyperslab).
    
    For efficiency the algorithm can do this after each slab is read,
    map that slab into the output buffer, then free the memory for the
    slab before the next slab is read. It does not matter if a subsequent
    slab is read and some of its elements overlap (map directly onto)
    elements of a previous slab already stored in the output buffer.
    After all input slabs have been read and mapped to the output
    buffer, the output buffer will be completely full and ready to
    write to the output file (or to screen).
    If it's being written to an output file, it can be written with
    a single write command, since its dimensions and count vectors are
    known.
    
    I've read your algorithm twice and it will give the same results
    as the algorithm that I just described.
    The basic difference is that your algorithm ensures that each
    value is read from disk at most one time, whereas, if the
    user specifies overlapping hyperslabs, my algorithm will retrieve
    the same value from disk as many times as the user specifies.
    I understand the bookkeeping required for my algorithm better
    than I do the bookkeeping required for your algorithm.
    Probably the algorithm that requires the fewest disk reads
    is superior in a time-of-execution sense.
    It is clear that my algorithm requires as many reads as the
    product of the number of hyperslabs specified for each dimension,
    e.g., 1*1*2*1*2=4 reads for the above example.
    
    It is possible that your algorithm _could_ reduce the total number
    of reads if the hyperslabs specified by the user contain a
    completely redundant specification, e.g., -d dim1,1,2 -d dim1,1,1
    would be reduced to one read by your algorithm but would be
    two reads with my algorithm.
    I think the chances of a user actually specifying a redundant
    hyperslab are very small.
    In any case, we should weigh the difficulty of writing your
    nco_calc_indices() algorithm against the difficulty of writing a loop
    around the current ncks algorithm which simply uses the user-specified
    hyperslab limits (and already does the appropriate recomputations
    when a hyperslab is wrapped).
    
    > I'm not sure yet how to deal with wrapped hyperslabs, If they are on their
    > own then their treatment should be as is, If they are with other limits then
    > break them into two separate ranges ????
    
    That is what the current lmt_evl() imlementation does: What is
    specified as a single hypslab is automatically broken into two
    separate reads.
    
    I havn't really considered wrapped multi-slabs because they bend my
    brain. All I know is that the current infrastructure handles wrapping
    correctly, and I _think_ that my algorithm, which simply loops over
    the current infrastructure, might therefore behave correctly with
    wrapped hyperslabs but I'm not sure...
    
    > Have to make sure that nco_calc_indicies() can cope with repeated dims in a
    > var
    > e.g bigvar[ time,lat,lat,lon,lon] or other such stuff
    
    In all the time I've written scientific software for netCDF this
    situation has never arisen. Is this even allowed by the netCDF
    standard? There are good physical grounds for hoping that it never
    does arise, but I can see situations where an array dimensioned, e.g.,
    var[dim_lon,dim_lon] would be useful, e.g., for cross-correlations
    of two variables that are functions of longitude.
    I have a feeling ncwa would misbehave if it ever encountered such a
    variable, because it's algorithms are based on comparing names of
    dimensions in variables to names of dimensions in weights...
    In any case, both your algorithm and my algorithm could be altered
    to handle such situations, but I would not worry about such
    variables because they have never been encountered.
    
    > For printing what about keeping the existing routine and calling it with
    > each of the hyperslab args ?
    
    Yes, something like that.
    
    Would you please post the full archive of our private discussions
    of this subject and any future followups to NCO developers so we have
    an archive of this discussion that others can refer to?
    BTW Sourceforge appears to be down at the present moment.
    
    Thanks,
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Charlie Zender - 2003-01-04
  
  Hi Charlie,
  
  Have thought up a neat method of calculating the indices of dimensions with
  multiple hyperslabs.
  Basically we have a function nco_calc_indices(lmt_all lmt_a, long srt,long
  cnt,long srd, bool end)
  The function has static variables , and returns hyperslab indices untill
  bool end is true.
  
  E.g suppose we have ncks -d time,0,4,2 -d time,3,3,1 -v time_test in.nc
  foo.nc
  
  The first hyperslab returned is 0,2,2
  second hyperslab                     3,4,1       ( end is true )
  
  The function nco_calc_indices() keeps an index for each hyperslap argument
  And searches for the lowest index from all the hyperslab arguments, it also
  keeps track of the difference between sucessive values.
  Another example
  
  ncks -d time 0,6,2 -d time 1,5,2
  
  The single hyperslab returned is 0,6,1      (end is true)
  
  Other combinations may not be so neat
  ncks -d time,0,10,3 ncks -d time 0,10,4
  
  First hyperslab --- 0,0,1
  second              --- 3,4,1
  third                      6,6,1
  four                       8,9,1        (end is true)
  
  I'm not sure yet how to deal with wrapped hyperslabs, If they are on their
  own then their treatment should be as is, If they are with other limits then
  break them into two separate ranges ????
  
  Have to make sure that nco_calc_indicies() can cope with repeated dims in a
  var
  e.g bigvar[ time,lat,lat,lon,lon] or other such stuff
  
  For printing what about keeping the existing routine and calling it with
  each of the hyperslab args ?
  
  Regards Henry
  
  Thanks for looking into this. I agree with your analysis in general.
  Specific comments below.
  
  Henry Butowsky wrote:
  
  > Hi Charlie,
  >
  > Have had a look at implementing hyperslab args with multiple args for a
  > single
  > dimension.
  >
  > e,g. ncks -d time,1,5,2 -d time,7,9 -d lon,90,180 in.nc foo.nc
  >
  > The task is certainly non-trivial. ncks is like the jewel in the crown of
  > the nco software, so the mods should not compromise existing
  
  functionality.
  
  >
  > I envisige the following
  >
  > Use nco_lmt_evl() to create multiple lmt_sct for each dimension and hold
  > them in
  > the following structure :-
  >
  > struct lmt_all {
  > char *dmn_nm; /* dimension name */
  > int dmn_cnt; /* total number of hyperslabs to extract */
  > int lmt_nbr; /* number of lmt args */
  > lmt_sct lmt*;
  > }
  >
  > Hold all lmt_sct for each dimension in lmt_all.
  
  Right, something like this is mandatory to solve the problem.
  
  >
  > We now require a function which can calculate the total number of
  
  hyberslabs
  
  > to extract
  > for each dimesion - Bear in mind that hyperslabs in lmt_all may overlap
  
  e.g
  
  > ncks -d time,1,6 -d time,3,10,4.
  > There are two types of hyperlab args those with stride and those with
  
  simply
  
  > a min and max index.
  >
  > With dmn_cnt defined we can call nco_cpy_var_dfn_lmt() to define the
  > variable and the
  > modified dimensions in the output.
  >
  >
  > If the variable contains only dimensions with O or 1 hyperslab args then
  
  use
  
  > the
  > existing nco_cpy_var_val_lmt() function to read/write the variable.
  
  Yes, nco_cpy_var_val_lmt() is already very complicated, and adding
  the ability to handle lists of dimensions should be done in a new
  routine.
  
  >
  >
  > Writing var with multiple hyperslap indices.
  > nco_cpy_var_val_lmt_all()
  >
  > Because of the possiblility of overlapping or multiple indices we cannot
  
  use
  
  > nco_get_varm () and nco_put_vara() to read/write.
  >
  > We have to read the whole variable into memory and manually loop though
  
  each
  
  > hyperslap.
  
  This conclusion should be re-examined because reading entire
  variables into memory should be avoided for large datasets.
  I envisage that reads could/should be done in a loop requiring
  the minimal number, which would be the product of the number of
  limits specified for each dimension, so if only one limit specified
  for each dimension, it should be done in one read, as currently,
  but if limits per dimension were, say, (1,1,2,1,3) for a five
  dimensional variable then it would require 1*1*2*1*3=6 reads.
  Each read does essentially what is done right now by ncks.
  
  The problem is knowing where to store the results in a contin
  but that's a problem that has to be tackled regardless of the read-in
  method. The algorithm to solve it would be similar to what is done in
  var_avg() already.
  
  >
  > We need a function: bool is_valid_index(lmt_all *lmt_1, int index)
  > which returns true if the index is in the hyperslab. ( this is fairly
  
  crude
  
  > and more
  > sophisticated optimizations may be possible by merging hyperslab info ,
  > or if the dimension is fairly small e.g < 1000 , producing a complete
  
  table
  
  > of valid
  > indices ) ( c.f What is the upper limit on a dimension ?)
  > I guess some kind of clever recursive function is called for. !
  > For a large multi-dimension variable there is gonna be a big performance
  
  hit
  
  > !
  
  I think all of that can be avoided.
  
  >
  > I have not yet thought about the special treatment of the record dimension
  > or the
  > printing of these hyperslabs.
  
  Yes, printing would be a real bear.
  
  I gather from today's message that you have had an inspired idea on
  how this all might be done in a cleaner way. I am interested to hear
  more.
  
  Thanks,
  Charlie
  
  >
  > Regards Henry
  >
  > .
  >
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Charlie Zender - 2003-01-04
  
  Hi Charlie,
  I have come up with a multi-slabbing algorithm that is a combination of mine
  and your's.
  
  My algorithim will be very inefficient for things like
  
  #1. -d lon,0,360,10 -d lon,0,360,6
  
  where the hyperslabs are interleaved. Your algorithm is superior when their
  are only a few overlapping indices.
  I could not fully comprehend your "book-keeping" ---- But realised that can
  I use my nco_calc_indices() function to merge the output hyperlab info for
  each dimension
  
  We can also use nco_calc_indices to find the size of the output dimensions ,
  so that we can define the variables before writing out the data
  
  FYI
  I have noticed that tha ncks doesn't handle correctly hyperslabs with two or
  more wrapped hyperslabs
  
  Do you want me to preceed with coding ?
  I thing the entire project will require less than 2000 lines of code. :-)
  
  Regards Henry
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Charlie Zender - 2003-01-04
  
  Hi Henry,
  
  I was trying to figure out your new algorithm and I couldn't
  so I wrote down my algorithm below. After doing that then I
  figured out what you were saying. So here first is my algorithm,
  followed by some comparisons between mine and yours.
  
  The following multislabs are legal:
  
  #1. -d lon,0,360,10 -d lon,1,360
  #2. -d lon,0,360,10 -d lon,1,360 -d lat,0,180,10 -d lat,90,180,5
  
  In #1 and #2, the indices grabbed should be the union of the indices
  specified in the individual -d commands.
  It is important to know in advance the size of the full extracted
  multi-slab so that memory can be allocated and it can be written
  into memory as it is read.
  The method I proposed for reading in a previous email does not rely
  on any complicated algorithms for obtaining indices of elements.
  The whole procedure is doable in a simple loop over the existing
  retrieval algorithm.
  What is needed is a new algorithm that maps the hyperslabs into
  the continuous buffer from which they will be output.
  Here is the generic request to get a multislab of an N-dimensional
  variable, which is to be hyperslabbed along two of its N dimensions.
  
  ncks -d dim1,lmt11 ... -d dim1,lmt12 -d dim2,lmt21 -ddim2,lmt22 in.nc out.nc
  
  The read loop reads 4 slabs into memory where the 4 slabs are
  
  [lmt11,lmt21] = slab1
  [lmt11,lmt22] = slab2
  [lmt12,lmt21] = slab3
  [lmt12,lmt22] = slab4
  
  Existing infracture can be used for this. The new algorithm
  must decide how to map these 4 slabs into the final, contiguous
  buffer which will be used for output.
  The contiguous output buffer is
  sizeof(union(slab1,slab2,slab3,slab4)).
  Thus the algorithm must loop over each of slabs 1--4 and, for
  each element of the slab, determine what its location in the
  final output buffer will be.
  The location in the final output buffer is uniquely specifiable
  as an N element vector.
  Computing the values of each element of this vector is the tricky part.
  It will involve creating map vectors as is already done in
  var_avg and in prn_var_val_lmt.
  In base-10 counting map vectors are the digits offset, e.g.,
  [1,10,100,...] and a number is specified as a list of coefficients
  times the digits offset, so that 737 = 7*100+3*10+1*1.
  The output hyperslabs needs a similar base, [C1,C2,C3,...CN] where
  C1=count{dim1},C2=count{dim2},....
  Once the base [C1,C2,...CN] is known the problem reduces to determining
  the coefficients of each base element [i.e., 7,3,7].
  These coefficients are what I call the map vector, [M1,M2,...MN].
  The M_o (where _o denotes the index in the output file) can be
  determined for each element by counting the position of the current
  dim_i (where _i denotes the index in the input file) in the output slab.
  Thus if the multislab specifies that K elements of dim_n should
  be extracted from the input file, then 0 <= M_o < K.
  These K elements are indexed M_i1,M_i2,M_i3,...M_ik,...M_iK in the
  input file.
  Each output index M_ok uniquely corresponds to an input index M_ik,
  and, in fact, M_ok=k (whereas M_ik is the k'th index of that dimension
  specified by the multi-hyperslab).
  
  For efficiency the algorithm can do this after each slab is read,
  map that slab into the output buffer, then free the memory for the
  slab before the next slab is read. It does not matter if a subsequent
  slab is read and some of its elements overlap (map directly onto)
  elements of a previous slab already stored in the output buffer.
  After all input slabs have been read and mapped to the output
  buffer, the output buffer will be completely full and ready to
  write to the output file (or to screen).
  If it's being written to an output file, it can be written with
  a single write command, since its dimensions and count vectors are
  known.
  
  I've read your algorithm twice and it will give the same results
  as the algorithm that I just described.
  The basic difference is that your algorithm ensures that each
  value is read from disk at most one time, whereas, if the
  user specifies overlapping hyperslabs, my algorithm will retrieve
  the same value from disk as many times as the user specifies.
  I understand the bookkeeping required for my algorithm better
  than I do the bookkeeping required for your algorithm.
  Probably the algorithm that requires the fewest disk reads
  is superior in a time-of-execution sense.
  It is clear that my algorithm requires as many reads as the
  product of the number of hyperslabs specified for each dimension,
  e.g., 1*1*2*1*2=4 reads for the above example.
  
  It is possible that your algorithm _could_ reduce the total number
  of reads if the hyperslabs specified by the user contain a
  completely redundant specification, e.g., -d dim1,1,2 -d dim1,1,1
  would be reduced to one read by your algorithm but would be
  two reads with my algorithm.
  I think the chances of a user actually specifying a redundant
  hyperslab are very small.
  In any case, we should weigh the difficulty of writing your
  nco_calc_indices() algorithm against the difficulty of writing a loop
  around the current ncks algorithm which simply uses the user-specified
  hyperslab limits (and already does the appropriate recomputations
  when a hyperslab is wrapped).
  
  > I'm not sure yet how to deal with wrapped hyperslabs, If they are on their
  > own then their treatment should be as is, If they are with other limits then
  > break them into two separate ranges ????
  
  That is what the current lmt_evl() imlementation does: What is
  specified as a single hypslab is automatically broken into two
  separate reads.
  
  I havn't really considered wrapped multi-slabs because they bend my
  brain. All I know is that the current infrastructure handles wrapping
  correctly, and I _think_ that my algorithm, which simply loops over
  the current infrastructure, might therefore behave correctly with
  wrapped hyperslabs but I'm not sure...
  
  > Have to make sure that nco_calc_indicies() can cope with repeated dims in a
  > var
  > e.g bigvar[ time,lat,lat,lon,lon] or other such stuff
  
  In all the time I've written scientific software for netCDF this
  situation has never arisen. Is this even allowed by the netCDF
  standard? There are good physical grounds for hoping that it never
  does arise, but I can see situations where an array dimensioned, e.g.,
  var[dim_lon,dim_lon] would be useful, e.g., for cross-correlations
  of two variables that are functions of longitude.
  I have a feeling ncwa would misbehave if it ever encountered such a
  variable, because it's algorithms are based on comparing names of
  dimensions in variables to names of dimensions in weights...
  In any case, both your algorithm and my algorithm could be altered
  to handle such situations, but I would not worry about such
  variables because they have never been encountered.
  
  > For printing what about keeping the existing routine and calling it with
  > each of the hyperslab args ?
  
  Yes, something like that.
  
  Would you please post the full archive of our private discussions
  of this subject and any future followups to NCO developers so we have
  an archive of this discussion that others can refer to?
  BTW Sourceforge appears to be down at the present moment.
  
  Thanks,
  Charlie
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- henry Butowsky - 2003-01-06
  
  Hi Charlie ,
  below is an explanation of how nco_msa_rec_clc() works.        nco_msa_rec_clc
          ---------------
  
  void *
  nco_msa_rec_clc /* [fnc] Multi slab algorithm (recursive routine, returns a single slab pointer */
  (int dpt_crr, /* current depth, we start at 0 */
  int dpt_crr_max, /* maximium depth (i.e the number of dims in variable (does not change)*/
  lmt_sct **lmt, /* limits of the current hyperslabs these change as we recurse */
  lmt_all **lmt_lst, /* list of limits in each dimension (this remains STATIC as we recurse) */
  var_sct *vara) /* Info for routine to read var info and pass info between calls */
  
  Consider the example
  
  ncks -v three_dmn_var_dbl -d time,1,3,1 -d time,2,8,2 -d lon,3,1,1 in.nc foo3.nc
  
  where
  
  three_dmn_var_dbl(time, lat,lon)=
                   1, 2, 3, 4, 5, 6, 7, 8,
                   9,10,11,12,13,14,15,16,
                  17,18,19,20,21,22,23,24,
                  -99,-99,-99,-99,-99,-99,-99,-99,
                  33,34,35,36,37,38,39,40,
                  41,42,43,44,45,46,47,48,
                  49,50,51,52,53,54,55,56,
                  -99,58,59,60,61,62,63,64,
                  65,66,67,68,69,70,71,72,
                  -99,74,75,76,77,78,79,-99;
  
  In nco_cpy_var_val_mlt_lmt() , preparations are made for the call to
  call nco_msa_rec_clc()
  
  lmt_mult[] is populated with the approriate dimension limits.
  
  lmt_mult[0] ---- time limits {1,3,1} {2,8,2}
  lmt_mult[1] ---- lat limit   {0,1,1}        ( This is the same as input dim )
  lmt_mult[2] ---- lon limit   {3,1,1}        ( A wrapped limit )
  
  Now call recursive routine :-
  void_ptr=nco_msa_rec_clc(0,nbr_dim,lmt,lmt_mult,&vara);
  
  in this case nbr_dmn=3, lmt= (0,0,0) .
  
  TIME LIMIT
  Since lmt_lst[0]->lmt_dmn_nbr == 2 (or lmt_mult[0]->lmt_dmn_nbr==2 ) we move
  to the hyperslab section of the function.
  since we have 2 limits we require 2 hyperslabs. So we make the following calls to
  get them:-
  
  cp_wrp[0] = nco_msa_rec_clc(1,3,lmt,lmt_lst,vara) ( lmt = {1,3,1},0,0 )
  cp_wrp[1] = nco_msa_rec_clc(1,3,lmt,lmt_lst,vara) ( lmt = {2,8,2},0,0 )
  
  We now have two strands. Lets follow cp_wrp[0]
  
  LAT LIMIT
  Since the depth = 1 we are considering the lat limit.
  In this case lmt_lst[1]->lmt_dmn_nbr ==1 (lmt_mult[1]->lmt_dmn_nbr ==1)
  vp=nco_msa_rec_clc(2,3,lmt,lmt_lst,vara)    (lmt= {1,3,1},{0,1,1}, 0 )
  There is no branching or wrapping here, we simply add the lat limit to the list and move on
  
  cp_wrp[0] = nco_msa_rec_clc(1,3,lmt,lmt_lst,vara) ( lmt = {1,3,1},0,0 )
  cp_wrp[1] = nco_msa_rec_clc(1,3,lmt,lmt_lst,vara) ( lmt = {2,8,2},0,0 )
  
  LON LIMIT
  depth = 2 so we are at lon limit
  This is a wrapped dimension . It broken into two pieces
  {3,1,1} -> {3,3,1} and {0,1,1}
  cp_wrp[0] = nco_msa_rec_clc(3,3,lmt,lmt_lst,vara) ( lmt={1,3,1},{0,1,1},{3,3,1} )
  cp_wrp[1] = nco_msa_rec_clc(3,3,lmt,lmt_lst,vara) ( lmt={1,3,1},{0,1,1},{0,1,1} )
  
  READ SECTION
  Since depth =3 , this means lmt is completly populated and so hyperslabs are read from the
  file and pointers returned. This read section is called 4 times.
  
  Below are all the calls to nco_msa_rec_clc in the order they are called.
  
  void_ptr=nco_msa_rec_clc(0,3,lmt,lmt_mult,&vara) (lmt= 0,0,0)
  
      cp_wrp[0] = nco_msa_rec_clc(1,3,lmt,lmt_lst,vara) ( lmt = {1,3,1},0,0 )
          vp=nco_msa_rec_clc(2,3,lmt,lmt_lst,vara)    (lmt= {1,3,1},{0,1,1}, 0 )
              cp_wrp[0] = nco_msa_rec_clc(3,3,lmt,lmt_lst,vara) ( lmt={1,3,1},{0,1,1},{3,3,1} ) READ
              cp_wrp[1] = nco_msa_rec_clc(3,3,lmt,lmt_lst,vara) ( lmt={1,3,1},{0,1,1},{0,1,1} ) READ
  
      cp_wrp[1] = nco_msa_rec_clc(1,3,lmt,lmt_lst,vara) ( lmt = {2,8,2},0,0 )
          vp=nco_msa_rec_clc(2,3,lmt,lmt_lst,vara)    (lmt= {2,8,2},{0,1,1}, 0 )
              cp_wrp[0] = nco_msa_rec_clc(3,3,lmt,lmt_lst,vara) ( lmt={2,8,2},{0,1,1},{3,3,1} ) READ
              cp_wrp[1] = nco_msa_rec_clc(3,3,lmt,lmt_lst,vara) ( lmt={2,8,2},{0,1,1},{0,1,1} ) READ
  
  Merging Hyperlabs
  -----------------
  In the above example, at depth 0 we need to merge the two TIME hyperslabs. The function nco_msa_clc_idx() Provides
  us with the recipe to merge the hyperslabs:- This is
  
  slb_nbr=0 srt=1 end=3 cnt=3 srd=1
  slb_nbr=1 srt=4 end=8 cnt=3 srd=2
  
  Because the depth is 0 we simply concatenate the two slabs:
  
  12, 9, 10,
  16, 13, 14,
  20, 17, 18,
  24, 21, 22,
  -99, -99, -99,
  -99, -99, -99,
  
  and
  
  36, 33, 34,
  40, 37, 38,
  52, 49, 50,
  56, 53, 54,
  68, 65, 66,
  72, 69, 70 ;
  
  Consider a more complex example e.g -d time,0,,2 -d time,0,3,5 -d time,3,,3 in.nc foo3.nc
  The recipe is now:-
  
  slb_nbr=0 srt=0 end=2 cnt=2 srd=2
  slb_nbr=2 srt=3 end=3 cnt=1 srd=1
  slb_nbr=0 srt=4 end=8 cnt=3 srd=2
  slb_nbr=2 srt=9 end=9 cnt=1 srd=1
  
  ( Note that because of overlapping the 3rd slab isn't used)
  
  Interleaved slabs with large dims can rapidly expand the recipe. e.g -d time,,,2 -d time,,,3
  
  slb_nbr=0 srt=0 end=2 cnt=2 srd=2
  slb_nbr=1 srt=3 end=3 cnt=1 srd=1
  slb_nbr=0 srt=4 end=8 cnt=3 srd=2
  slb_nbr=1 srt=9 end=9 cnt=1 srd=1
  
  ---------------------------------------------------------------------------------------------------------
  
  Lets look at the nco_msa_rec_clc call structure for:-
  
  ncks -v four_dmn_rec_var -d time,9,2,1 -d lev,0,0,1 -d lev,2,2,1 -d lon,2,3,1 in.nc foo3.nc
  
  ( float four_dmn_rec_var(time,lat,lev,lon); )
  
  void_ptr=nco_msa_rec_clc(0,4,...) (lmt= 0,0,0,0)
  
  cp_wrp[0]=nco_msa_rec_clc(1,4,..)                       lmt = {9,9,1},0,0,0
      vp =nco_msa_rec_clc(2,4,..)                     lmt = {9,9,1},{0,1,1},0,0
          cp_wrp[0]= nco_msa_rec_clc(3,4,...)     lmt = {9,9,1},{0,1,1},{0,0,1},0
              vp = nco_msa_rec_clc(4,4,...)   lmt = {9,9,1},{0,1,1},{0,0,1},{2,3,1} READ
          cp_wrp[1]= nco_msa_rec_clc(3,4,...)     lmt = {9,9,1},{0,1,1},{2,2,1},0
              vp = nco_msa_rec_clc(4,4,...)   lmt = {9,9,1},{0,1,1},{2,2,1},{2,3,1} READ
  cp_wrp[1]=nco_msa_rec_clc(1,4,..)                       lmt= {0,2,1},0,0,0
      vp =nco_msa_rec_clc(2,4,..)                     lmt = {0,2,1},{0,1,1},0,0
          cp_wrp[0]= nco_msa_rec_clc(3,4,...)     lmt = {0,2,1},{0,1,1},{0,0,1},0
              vp = nco_msa_rec_clc(4,4,...)   lmt = {0,2,1},{0,1,1},{0,0,1},{2,3,1} READ
          cp_wrp[1]= nco_msa_rec_clc(3,4,...)     lmt = {9,9,1},{0,1,1},{2,2,1},0
              vp = nco_msa_rec_clc(4,4,...)   lmt = {0,2,1},{0,1,1},{2,2,1},{2,3,1} READ
  
  The number of reads a var requires is simply the multiple of the number of limits for each dimension.
  
  I hope this explanation is usefull
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nobody/Anonymous - 2003-01-22
  
  Hi Charlie,
  The print routine in ncks needs re-writing to take account of multi-slabbing etc -- Do you want to tackle it or shall I ? It is quite a complex routine and has many inputs.
  
  Regards Henry
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2003-01-22
    
    Hi Henry,
    
    Yes, please! It would be great if you would do this.
    Let's keep prn_var_val_lmt() as is and use it as
    the basis for prn_var_val_msa() and then switch
    to the latter when you're done with it.
    
    Thanks,
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- henry Butowsky - 2003-02-03
  
  Hi Charlie,
  Have started thinking about the printing problem.
  I am at a loss with the variable indicies.
  
  i.e
  ncks -H -C -v three_dmn_var_dbl -d time,0,1 -d lon,0,,2 in.nc
  time[0]=1 lat[0]=-90 lon[0]=0 three_dmn_var_dbl[0]=1
  time[0]=1 lat[0]=-90 lon[2]=180 three_dmn_var_dbl[1]=3
  time[0]=1 lat[1]=90 lon[0]=0 three_dmn_var_dbl[2]=5
  time[0]=1 lat[1]=90 lon[2]=180 three_dmn_var_dbl[3]=7
  time[1]=2 lat[0]=-90 lon[0]=0 three_dmn_var_dbl[4]=9
  time[1]=2 lat[0]=-90 lon[2]=180 three_dmn_var_dbl[5]=11
  time[1]=2 lat[1]=90 lon[0]=0 three_dmn_var_dbl[6]=13
  time[1]=2 lat[1]=90 lon[2]=180 three_dmn_var_dbl[7]=15
  
  What do the variable indices( 0-7 ) refer to ? since lon has stride ,
  these values are not coninuous in the input file ?
  
  Another example ?
  ncks -H -C -v three_dmn_var_dbl -d time,8,9 -d lat,0,0 -d lon,0,,2 in.nc
  time[8]=9 lat[0]=-90 lon[0]=0 three_dmn_var_dbl[64]=65
  time[8]=9 lat[0]=-90 lon[2]=180 three_dmn_var_dbl[65]=67
  time[9]=10 lat[0]=-90 lon[0]=0 three_dmn_var_dbl[66]=-99
  time[9]=10 lat[0]=-90 lon[2]=180 three_dmn_var_dbl[67]=75
  
  Obviously my routine needs to backwardly compatible ?
  
  In order to get the correct indices for the dimensions I shall add a
  "mapping" array to the lmt_all structure. and a routine nco_msa_org_inx()
  to populate this array (what fun!)
  
  /* Container for limit structures */
  typedef struct {
  char *dmn_nm;        /* dimension name */
  long dmn_cnt;       /* total number of hyperslabs to extract */
  long *dmn_map       /* will be initialized to size dmn_cnt   */
                        /* dmn_map[i] will contain original index */
  int lmt_dmn_nbr;     /* number of lmt args */
  lmt_sct **lmt_dmn;   /* list of limit structures associated with each dimension */
  } lmt_all
  
  Have you informed Al Franz of my new algorithm ? It would be nice for it to
  checked out with real data ? If you want I can e-mail him !
  
  Regards Henry
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2003-02-03
    
    > Have started thinking about the printing problem.
    > I am at a loss with the variable indicies.
    >
    > i.e
    > ncks -H -C -v three_dmn_var_dbl -d time,0,1 -d lon,0,,2 in.nc
    > time[0]=1 lat[0]=-90 lon[0]=0 three_dmn_var_dbl[0]=1
    > time[0]=1 lat[0]=-90 lon[2]=180 three_dmn_var_dbl[1]=3
    > time[0]=1 lat[1]=90 lon[0]=0 three_dmn_var_dbl[2]=5
    > time[0]=1 lat[1]=90 lon[2]=180 three_dmn_var_dbl[3]=7
    > time[1]=2 lat[0]=-90 lon[0]=0 three_dmn_var_dbl[4]=9
    > time[1]=2 lat[0]=-90 lon[2]=180 three_dmn_var_dbl[5]=11
    > time[1]=2 lat[1]=90 lon[0]=0 three_dmn_var_dbl[6]=13
    > time[1]=2 lat[1]=90 lon[2]=180 three_dmn_var_dbl[7]=15
    >
    > What do the variable indices( 0-7 ) refer to ? since lon has stride ,
    > these values are not coninuous in the input file ?
    
    > Another example ?
    > ncks -H -C -v three_dmn_var_dbl -d time,8,9 -d lat,0,0 -d lon,0,,2 in.nc
    > time[8]=9 lat[0]=-90 lon[0]=0 three_dmn_var_dbl[64]=65
    > time[8]=9 lat[0]=-90 lon[2]=180 three_dmn_var_dbl[65]=67
    > time[9]=10 lat[0]=-90 lon[0]=0 three_dmn_var_dbl[66]=-99
    > time[9]=10 lat[0]=-90 lon[2]=180 three_dmn_var_dbl[67]=75
    
    ncks index printing only works as you expect it to when strides and
    wraps are not involved. The values printed are correct but the element
    number (the subscript on the variable whose values are being dumped)
    is an artifact of the simplicity of the current printing algorithm.
    I think the algorithm you are proposing has all the information
    necessary to print exactly what you want it to even when wraps and
    strides are involved.
    
    The question is, what should the algorithm print?
    It could either print the element number in the input file,
    or the element number in the hyperslab (output file).
    The dimension subscripts do refer to the offset in the input file,
    so to be most consistent, the element subscript should also refer
    to the element number in the output file.
    If the user wishes to know the element number in the hyperslab,
    they can simply run ncks -H on the output file.
    
    > Obviously my routine needs to backwardly compatible ?
    
    Backwards compatibility is not necessary in this case for a variety
    of reasons. Your new algorithm will be more flexible, capable, and
    consistent than the old algorithm, which should just go away
    eventually.
    
    > In order to get the correct indices for the dimensions I shall add a
    > "mapping" array to the lmt_all structure. and a routine nco_msa_org_inx()
    > to populate this array (what fun!)
    >
    >
    > /* Container for limit structures */
    > typedef struct {
    > char *dmn_nm;        /* dimension name */
    > long dmn_cnt;       /* total number of hyperslabs to extract */
    > long *dmn_map       /* will be initialized to size dmn_cnt   */
    >                       /* dmn_map[i] will contain original index */
    > int lmt_dmn_nbr;     /* number of lmt args */
    > lmt_sct **lmt_dmn;   /* list of limit structures associated with each dimension
    > */
    > } lmt_all
    
    Sounds good.
    
    > Have you informed Al Franz of my new algorithm ? It would be nice for it to
    > checked out with real data ? If you want I can e-mail him !
    
    Yes, I think you should e-mail him so he can test your stuff out
    and so he does not waste his time making an obsolete NCO contribution.
    
    Thanks,
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- henry Butowsky - 2003-02-04
  
  hi Charlie,
  Have you got Al Franz's email ? My computer had a "meltdown" and I lost all my old e-mails.
  I am a bit confused - you want the element subscript for the vars to be based on the output file -even though the dim subscripts are based on the input file ?
  
  Regards Henry
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2003-02-04
    
    Hi Henry,
    
    You misunderstood me. I think the indices from the
    input file should be printed by default. It might be
    nice to have a (long) option to print indices from
    the ouput hyperslab, but not necessary. Here is
    the relevant portion of what I said:
    
    The question is, what should the algorithm print?
    It could either print the element number in the input file,
    or the element number in the hyperslab (output file).
    The dimension subscripts do refer to the offset in the input file,
    so to be most consistent, the element subscript should also refer
    to the element number in the output file.
    If the user wishes to know the element number in the hyperslab,
    they can simply run ncks -H on the output file.
    
    So, despite what ncks does now, I think printing
    the element numbers from the input file is the way
    to go.
    
    Al Franz is catburglar at llnl dot gov
    
    Regards,
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- henry Butowsky - 2003-02-18
  
  Hi Charlie,
  Have commited a replacement for nco_prn_var_val_lmt , its in
  nco_msa.c and is nco_msa_prn_var_val.
  Beware its still Beta !
  
  At the moment it doesn't do anything special with NC_CHAR variables.
  It would be great if you could summarise for me, what it should be doing. I look at the TODO list then your old code and I am lost !
  
  regards Henry
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2003-02-21
    
    Hi Henry,
    
    I read through the TODO and found some contradictory
    statements concerning NC_CHAR which I attempted to
    reconcile. Basically, we want to ncks to be able to
    print out strings stored as NC_CHAR's. Since a lot
    of people NUL-terminate their strings, we want to take
    that as an indication that the string is ended, and not
    print and ugly stuff in the remainder of the variable.
    Since netCDF does not support a true string type,
    many files are forced to declare dimensions for NC_CHAR variables that are waaaaay too long,
    just to be sure they have enough room to store whatever comes along at runtime.
    
    The existing algorithm tries to do all this.
    However, I think you will find that it also tries
    to print the offset relative to the begining of the
    string, rather than the beginning of the variable.
    Imagine a 2-D NC_CHAR variable, e.g., a list of
    filenames. Ask yourself what you as a user would
    like to see if you did ncks -H on the file. I think you
    will agree that the answer is slightly different than
    what you would want to see for a two dimensional
    numeric array. This has been my thinking anyway.
    
    I know I havn't written down an exact policy here,
    but I do not want to contrain you too much.
    Try and stay backwards compatible where possible,
    but if it does not make sense, then break compatibility. I'm going away for the weekend soon.
    If you find any specific examples of problematic
    NC_CHAR printing, you might post them for discussion and votes as to what the users would prefer.
    
    Thanks!
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Charlie Zender - 2003-02-24
      
      Hi Henry,
      
      In preparation for 2.7.2, which will mainly be your
      ncks printing mods, would you please clean up the
      compiler warning from -ansi -pedantic?
      
      Thanks!
      Charlie
      
      gcc -std=c99 -ansi -pedantic -D_BSD_SOURCE -DLINUX -DNO_NETCDF_2 -DVERSION='20030224' -DHOSTNAME='dust.ps.uci.edu' -DUSER='zender' -DHAVE_GETOPT_H -DHAVE_MKSTEMP -DHAVE_STRCASECMP -DHAVE_STRDUP -Di386 -I./ -I../src/nco -I/usr/local/include -U_OPENMP -Wall -O -c ../src/nco/nco_msa.c -o /home/zender/obj/LINUX/nco_msa.o
      ../src/nco/nco_msa.c: In function `nco_msa_wrp_splt':
      ../src/nco/nco_msa.c:394: warning: `index' might be used uninitialized in this function
      ../src/nco/nco_msa.c: In function `nco_msa_prn_var_val':
      ../src/nco/nco_msa.c:667: warning: assignment discards qualifiers from pointer target type
      ../src/nco/nco_msa.c:681: warning: assignment of read-only location
      ../src/nco/nco_msa.c:683: warning: implicit declaration of function `sng_ascii_trn'
      ../src/nco/nco_msa.c:732: warning: implicit declaration of function `nco_typ_fmt_sng'
      ../src/nco/nco_msa.c:732: warning: format argument is not a pointer (arg 3)
      ../src/nco/nco_msa.c:739: warning: format argument is not a pointer (arg 3)
      ../src/nco/nco_msa.c:800: warning: implicit declaration of function `cast_void_nctype'
      ../src/nco/nco_msa.c:842: warning: format argument is not a pointer (arg 3)
      ../src/nco/nco_msa.c:865: warning: format argument is not a pointer (arg 3)
      ar rv /home/zender/lib/LINUX/libnco.a /home/zender/obj/LINUX/nco_msa.o
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- henry Butowsky - 2003-02-25
  
  Hi Charlie,
  Have done a tidy up of nco_msa.c.
  I have added some basic string handling in nxo_msa_prn_var_val .
  
  For string NOT null terminated I was thinking of displaying the actual escape sequences rather than printing the char. i.e will ned a function sng_trn_asci
  
  regards Henryi
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2003-03-05
    
    Thanks. I also did some more tidying to make the builds cleaner.
    It now builds as clean as possible on IRIX, AIX, and LINUX.
    
    >For string NOT null terminated I was thinking of displaying the
    >actual escape sequences rather than printing the char. i.e will ned
    >a function sng_trn_asci
    
    We should be careful here. If the variable is type NC_CHAR, then I
    would assume that the information content of the variable is strings,
    i.e., words. Fortran is an example of a language with no NUL
    terminator in its strings, so Fortran apllications usually do not end
    their strings with NUL. Instead is garbage after the last valid
    character. Without NUL, there is no way to know the intended last
    valid character in a NC_CHAR variable (although you can guess that
    atoi(*cp) > 127 signifies you're in garbage-land.
    
    Here is the current ouput of ncks -H on the character and byte variables.
    I inserted some comments on formatting...
    
    zender@dust:~/nco/data$ ncks -C -H -v char_var,char_var_space,char_var_nul,char_var_multinul,fl_nm,fl_nm_arr,non_nul_trm_char_one_dim,non_nul_trm_char_two_dim,byte_var,one_dmn_rec_var_sng,two_dmn_rec_var_sng
    ~/nco/data/in.nc
    byte_var = 122
    char_var='z'
    I like that non-NUL-terminated characters are enclosed in '' rather
    than "". To me '' means non-string, and "" means string (i.e.,
    NUL-terminated).
    lev(0)=100 char_var_multinul(0--2)="
    "
    Parentheses for index subscripts should be square brackets [] when
    they adhere to the C (0-based) formatting convention. If user
    specifies --fortran then offset indices by +1 and use (). This problem
    occurs in many of the following examples
    
    char_var_nul=''
    I think char_var_nul is a valid 0-character string and so should be printed ""
    char_var_space=' '
    char_dmn_lng(0) fl_nm(0--28)="/home/zender/nco/data/in.cdl"
    
    fl_dim(0)=a char_dmn_lng(0) fl_nm_arr(0--41)="/data/zender/dstccm04/dstccm04_8589_01.nc"
    fl_dim(1)=b char_dmn_lng(0) fl_nm_arr(0--41)="/data/zender/dstccm04/dstccm04_8589_02.nc"
    fl_dim(2)=3 char_dmn_lng(0) fl_nm_arr(0--41)="/data/zender/dstccm04/dstccm04_8589_03.nc"
    Are you happy with this? Should fl_nm_arr use the total offsets or just the string length?
    I know we discussed this earlier, but what do you think is more useful?
    I think I like the following better:
    fl_dim[0]=a char_dmn_lng[0--41] fl_nm_arr[0--41]="/data/zender/dstccm04/dstccm04_8589_01.nc"
    fl_dim[1]=b char_dmn_lng[0--41] fl_nm_arr[42--81]="/data/zender/dstccm04/dstccm04_8589_02.nc"
    
    char_dmn_sml(0) non_nul_trm_char_one_dim(0--2)="ab"
    I would prefer that this be 'ab' (unless somehow netCDF wrote a NUL
    terminator into position 3, e.g., when it pre-filled the variable)
    
    fl_dim(0)=a char_dmn_sml(0) DMN SIZE= 4
    non_nul_trm_char_two_dim(0--3)=abcd
    fl_dim(1)=b char_dmn_sml(0) DMN SIZE= 4
    non_nul_trm_char_two_dim(0--3)=efgh
    fl_dim(2)=3 char_dmn_sml(0) DMN SIZE= 4
    non_nul_trm_char_two_dim(0--3)=ijkm
    I think these should have single quotes, e.g., 'abcd'
    Also, what's with the "DMN SIZE = X" portion?
    Do you intend to keep that?
    I like the following output format better
    fl_dim[0]=a char_dmn_sml[0--3] non_nul_trm_char_two_dim[0--3]='abcd'
    fl_dim[1]=a char_dmn_sml[0--3] non_nul_trm_char_two_dim[4--7]='efgh'
    What do you think?
    
    time(0)=1 one_dmn_rec_var_sng(0--9)="Hello Wor"
    
    time(0)=1 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=abc
    time(1)=2 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=bcd
    time(2)=3 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=cde
    time(3)=4 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=def
    time(4)=5 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=efg
    time(5)=6 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=fgh
    time(6)=7 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=ghi
    time(7)=8 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=hij
    time(8)=9 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=jkl
    time(9)=10 lev(0)=100 DMN SIZE= 3
    two_dmn_rec_var_sng(0--2)=klm
    Same comments as above, I think I would prefer, e.g.,
    time[1]=2 lev[0]=100 two_dmn_rec_var_sng(3--5)='bcd'
    
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nobody/Anonymous - 2003-03-05
      
      Hi Charlie,
      II've digested your comments. - I've got quite a bit of paid work on at the moment and shall not get a chance to do the mods untill possibly the end of next week. The printing of "DMN_SIZE" is just for debugging and can safely be removed.
      
      Regards Henry
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- henry Butowsky - 2003-03-14
  
  Hi Charlie,
  Have sorted out the FORTRAN/C indices printing. Sorry for the mix up.
  Non strings are now enclosed within ' ' ,
  Dimensionless nulls and variables hyperslabed to null are printed ""
  
  Non strings indices are offests from lmn i.el_dim[0]=a char_dmn_sml[0--3] non_nul_trm_char_two_dim[0--3]='abcd'
  fl_dim[1]=a char_dmn_sml[0--3] non_nul_trm_char_two_dim[4--7]='efgh'
  
  I have done the same with strings the offset size is the dmn_sz
  If we used your method then the indices could end up being identical to the form when the nc_char dimension has limits
  
  i.e
  ncks -H -C -v fl_nm_arr in.nc would be the same as
  
  nkcs -H -C -v fl_nm_arr -d char_dmn_lng,0,41 in.nc
  
  Regards Henry
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2003-03-14
    
    Hi Henry,
    
    Thanks. I agree with all of the mods you've made except the
    following one, which I'm not sure about. The current behavior seems to
    be keying the indices XXX--YYY in fl_nm_arr[XXX--YYY] to the buffer
    in memory rather than to the file on disk. Is this intentional?
    It is what ncks used to do, but I thought we discussed this and ended
    up agreeing that XXX--YYY should refer to the variable on disk. If the
    user wants to see the XXX--YYY refer to the slab in memory, they can
    use ncks twice, once to put the slab in a new file, and the second
    time to print the contents of the new file. To me this makes more
    sense. What do you think?
    
    zender@dust:~/nco/data$ ncks -H -C -v fl_nm_arr -d char_dmn_lng,0,41 in.nc
    fl_dmn[0]=a char_dmn_lng[0] fl_nm_arr[0--41]="/data/zender/dstccm04/dstccm04_8589_01.nc"
    fl_dmn[1]=b char_dmn_lng[0] fl_nm_arr[42--83]="/data/zender/dstccm04/dstccm04_8589_02.nc"
    fl_dmn[2]=3 char_dmn_lng[0] fl_nm_arr[84--125]="/data/zender/dstccm04/dstccm04_8589_03.nc"
    
    zender@dust:~/nco/data$ ncks -H -C -v fl_nm_arr -d char_dmn_lng,1,41 in.nc
    fl_dmn[0]=a char_dmn_lng[1] fl_nm_arr[0--40]="data/zender/dstccm04/dstccm04_8589_01.nc"
    fl_dmn[1]=b char_dmn_lng[1] fl_nm_arr[41--81]="data/zender/dstccm04/dstccm04_8589_02.nc"
    fl_dmn[2]=3 char_dmn_lng[1] fl_nm_arr[82--122]="data/zender/dstccm04/dstccm04_8589_03.nc"
    
    I suggest that the latter command print the following XXX--YYY instead:
    
    zender@dust:~/nco/data$ ncks -H -C -v fl_nm_arr -d char_dmn_lng,1,41 in.nc
    fl_dmn[0]=a char_dmn_lng[1] fl_nm_arr[1--41]="data/zender/dstccm04/dstccm04_8589_01.nc"
    fl_dmn[1]=b char_dmn_lng[1] fl_nm_arr[43--83]="data/zender/dstccm04/dstccm04_8589_02.nc"
    fl_dmn[2]=3 char_dmn_lng[1] fl_nm_arr[85--125]="data/zender/dstccm04/dstccm04_8589_03.nc"
    
    Note that the behavior (XXX--YYY referring to on disk values)
    should be the same for in this regard for floats, doubles, etc.
    
    Your printing mods are the only new feature for 2.7.2.
    If you agree with my suggested change, then we will release 2.7.2
    after you supply the patch. Otherwise, I'll ready 2.7.2 for release now.
    
    It's great to hear that you're getting some consulting work!
    I will try to get an "SDO" proposal submitted to NASA this year.
    
    Thanks,
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- henry Butowsky - 2003-03-17
  
  Hi Charlie, sorry about the confusion over the indices for char
  Have commited a new nco_msa.c
  Indices are now relative to disk.
  
  Regards Henry
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Charlie Zender - 2003-03-17
    
    Great, I'll make a 2.7.2 release out of this soon.
    
    Thanks!
    Charlie
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Multislabbing discussions archive

Command-line operators for netCDF and HDF files

Forums

Help

Multislabbing discussions archive

Multislabbing discussions archive

Command-line operators for netCDF and HDF files

Forums

Help

Multislabbing discussions archive document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Multislabbing discussions archive