Menu

TODO #52: adding min,max,sdn,ttl

Developers
2000-06-15
2013-10-17
  • Charlie Zender

    Charlie Zender - 2000-06-15

    > Was thinking of tackling No 52 on your TODO list ( adding min,max,
    > sdn,ttl for ncra,ncea,ncwa ). What are your thoughts on this ?

    There are a number of way this could be done.  I think it would be
    best to do these operations instead of, rather than in addition to,
    averaging, at least at first. You will need to add a command line
    switch to allow the user to pick which operation the operator should
    perform. Assume the default operation is averaging. So, e.g., modify
    ncra (whose name may have to change eventually) to do the following:

    avg (current default) -- returns time average
    min -- returns time minimum
    max -- returns time maximum
    ttl -- temporal sum
    sdn -- temporal std dvn

    Currently the variable structure carries a buffer which accumulates
    a running total, a buffer which contains a tally of the number of
    entries (i.e., records, timesteps) at each gridpoint in the running
    total buffer, and the averaging is done once the final record is
    read. The min,max, and ttl operations should just use the running
    total buffer for their specific purposes (e.g., running minimum), no
    need to add new buffers yet. 

    The sdn operation requires either

    1. Two passes through the files:
    Generate the mean on the first pass, do the standard deviation on
    the second pass). This requires adding a new single timestep buffer
    or two to be used on the second pass (one to hold the instantaneous
    value and one to accumulate a the running sum of the squares the
    differences between the current timestep and the mean).

    or:

    2. One pass through the files with one single timestep buffer (to
    create the mean) and a new multi-timestep buffer (to hold the entire
    time series of all variables) which will be a huge memory hog.

    Either one is a fairly hefty change but I would prefer option 1
    because many users operate on Gb-size files so option 2. is completely
    unrealistic (at least for ncra/ncea, #2 is fine for ncwa).

    I recommend implementing min,max, and total first to get the hang of
    the NCO API. Once those are working, you will no doubt have a better
    idea of how to do the sdn. Note that the sdn memory usage is only an
    issue for multi-file operators, putting sdn in single file averager
    (e.g., to do spatial standard deviations) like ncwa does not increase
    memory usage unacceptably.

    Thus the approach to take for sdn should probably differ between
    ncra and ncea (use #1 above, two passes through files) and ncwa
    (use #2 above, make additional copy of entire hyperslab). It would
    be nice to keep the two approaches sharing as much common code as
    possible.

    How's all that sound to you?
    Charlie

     
    • henry Butowsky

      henry Butowsky - 2000-06-20

      Hi zender, have completed min, max ttl for ncra.c going on to do it for ncwa. How are we gonna deal with new source ? Use CVS ( I need the login password ) ? Have only used SCCS before so watch out !

       
      • henry Butowsky

        henry Butowsky - 2000-06-23

        Hello have made a patch for min.max total forncra,ncea. The command line option is -y(min.max.total) . I can change this if necessary. Codes a bit rough because I haven't programmed in C for a while ! Just realised var_copy is redundant could have used memcopy instead. Could add a few more summation types e.g sum of squares, average squared, then use ncdiff to find sd

         
        • henry Butowsky

          henry Butowsky - 2000-06-26

          Have added to more functions avgsqr which squaes the averages and avgsumsqr which is the sum of the squares over n.  

          After doing something like ncra -yavgsumsqr in1.nc out1.nc
                                 ncra -yavgsqr in1.nc out2.nc
                         ncdiff out1.nc out2.nc out3.nc

          The sqaures of the sdn will be in out3 ------

                     

           
          • Charlie Zender

            Charlie Zender - 2000-06-30

            I've applied this patch and cleaned it up a bit.
            I like the implementation of the operators as a case
            statement in a function. This allows for future expandability.
            It appears not to break existing averaging capablities but
            does not work as advertised for some cases. Please
            submit a new patch against the current code
            which fixes the following:

            min and total do not appear to work, e.g.:
            ncra -C -O -y total -v time in.nc foo.nc; ncks -H foo.nc

            add simple test cases like the above to nco_tst.sh

            adds documentation of feature to manual.

            Thanks!
            Charlie

             
            • Charlie Zender

              Charlie Zender - 2000-07-08

              min/max/ttl now appear to be working.
              I added test cases to nco_tst.sh.
              They still need to be documented, though.
              Please stress test this new feature everybody.

               

Log in to post a comment.