> Was thinking of tackling No 52 on your TODO list ( adding min,max,
> sdn,ttl for ncra,ncea,ncwa ). What are your thoughts on this ?
There are a number of way this could be done. I think it would be
best to do these operations instead of, rather than in addition to,
averaging, at least at first. You will need to add a command line
switch to allow the user to pick which operation the operator should
perform. Assume the default operation is averaging. So, e.g., modify
ncra (whose name may have to change eventually) to do the following:
avg (current default) -- returns time average
min -- returns time minimum
max -- returns time maximum
ttl -- temporal sum
sdn -- temporal std dvn
Currently the variable structure carries a buffer which accumulates
a running total, a buffer which contains a tally of the number of
entries (i.e., records, timesteps) at each gridpoint in the running
total buffer, and the averaging is done once the final record is
read. The min,max, and ttl operations should just use the running
total buffer for their specific purposes (e.g., running minimum), no
need to add new buffers yet.
The sdn operation requires either
1. Two passes through the files:
Generate the mean on the first pass, do the standard deviation on
the second pass). This requires adding a new single timestep buffer
or two to be used on the second pass (one to hold the instantaneous
value and one to accumulate a the running sum of the squares the
differences between the current timestep and the mean).
or:
2. One pass through the files with one single timestep buffer (to
create the mean) and a new multi-timestep buffer (to hold the entire
time series of all variables) which will be a huge memory hog.
Either one is a fairly hefty change but I would prefer option 1
because many users operate on Gb-size files so option 2. is completely
unrealistic (at least for ncra/ncea, #2 is fine for ncwa).
I recommend implementing min,max, and total first to get the hang of
the NCO API. Once those are working, you will no doubt have a better
idea of how to do the sdn. Note that the sdn memory usage is only an
issue for multi-file operators, putting sdn in single file averager
(e.g., to do spatial standard deviations) like ncwa does not increase
memory usage unacceptably.
Thus the approach to take for sdn should probably differ between
ncra and ncea (use #1 above, two passes through files) and ncwa
(use #2 above, make additional copy of entire hyperslab). It would
be nice to keep the two approaches sharing as much common code as
possible.
How's all that sound to you?
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi zender, have completed min, max ttl for ncra.c going on to do it for ncwa. How are we gonna deal with new source ? Use CVS ( I need the login password ) ? Have only used SCCS before so watch out !
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello have made a patch for min.max total forncra,ncea. The command line option is -y(min.max.total) . I can change this if necessary. Codes a bit rough because I haven't programmed in C for a while ! Just realised var_copy is redundant could have used memcopy instead. Could add a few more summation types e.g sum of squares, average squared, then use ncdiff to find sd
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've applied this patch and cleaned it up a bit.
I like the implementation of the operators as a case
statement in a function. This allows for future expandability.
It appears not to break existing averaging capablities but
does not work as advertised for some cases. Please
submit a new patch against the current code
which fixes the following:
min and total do not appear to work, e.g.:
ncra -C -O -y total -v time in.nc foo.nc; ncks -H foo.nc
add simple test cases like the above to nco_tst.sh
adds documentation of feature to manual.
Thanks!
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
min/max/ttl now appear to be working.
I added test cases to nco_tst.sh.
They still need to be documented, though.
Please stress test this new feature everybody.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Was thinking of tackling No 52 on your TODO list ( adding min,max,
> sdn,ttl for ncra,ncea,ncwa ). What are your thoughts on this ?
There are a number of way this could be done. I think it would be
best to do these operations instead of, rather than in addition to,
averaging, at least at first. You will need to add a command line
switch to allow the user to pick which operation the operator should
perform. Assume the default operation is averaging. So, e.g., modify
ncra (whose name may have to change eventually) to do the following:
avg (current default) -- returns time average
min -- returns time minimum
max -- returns time maximum
ttl -- temporal sum
sdn -- temporal std dvn
Currently the variable structure carries a buffer which accumulates
a running total, a buffer which contains a tally of the number of
entries (i.e., records, timesteps) at each gridpoint in the running
total buffer, and the averaging is done once the final record is
read. The min,max, and ttl operations should just use the running
total buffer for their specific purposes (e.g., running minimum), no
need to add new buffers yet.
The sdn operation requires either
1. Two passes through the files:
Generate the mean on the first pass, do the standard deviation on
the second pass). This requires adding a new single timestep buffer
or two to be used on the second pass (one to hold the instantaneous
value and one to accumulate a the running sum of the squares the
differences between the current timestep and the mean).
or:
2. One pass through the files with one single timestep buffer (to
create the mean) and a new multi-timestep buffer (to hold the entire
time series of all variables) which will be a huge memory hog.
Either one is a fairly hefty change but I would prefer option 1
because many users operate on Gb-size files so option 2. is completely
unrealistic (at least for ncra/ncea, #2 is fine for ncwa).
I recommend implementing min,max, and total first to get the hang of
the NCO API. Once those are working, you will no doubt have a better
idea of how to do the sdn. Note that the sdn memory usage is only an
issue for multi-file operators, putting sdn in single file averager
(e.g., to do spatial standard deviations) like ncwa does not increase
memory usage unacceptably.
Thus the approach to take for sdn should probably differ between
ncra and ncea (use #1 above, two passes through files) and ncwa
(use #2 above, make additional copy of entire hyperslab). It would
be nice to keep the two approaches sharing as much common code as
possible.
How's all that sound to you?
Charlie
Hi zender, have completed min, max ttl for ncra.c going on to do it for ncwa. How are we gonna deal with new source ? Use CVS ( I need the login password ) ? Have only used SCCS before so watch out !
Hello have made a patch for min.max total forncra,ncea. The command line option is -y(min.max.total) . I can change this if necessary. Codes a bit rough because I haven't programmed in C for a while ! Just realised var_copy is redundant could have used memcopy instead. Could add a few more summation types e.g sum of squares, average squared, then use ncdiff to find sd
Have added to more functions avgsqr which squaes the averages and avgsumsqr which is the sum of the squares over n.
After doing something like ncra -yavgsumsqr in1.nc out1.nc
ncra -yavgsqr in1.nc out2.nc
ncdiff out1.nc out2.nc out3.nc
The sqaures of the sdn will be in out3 ------
I've applied this patch and cleaned it up a bit.
I like the implementation of the operators as a case
statement in a function. This allows for future expandability.
It appears not to break existing averaging capablities but
does not work as advertised for some cases. Please
submit a new patch against the current code
which fixes the following:
min and total do not appear to work, e.g.:
ncra -C -O -y total -v time in.nc foo.nc; ncks -H foo.nc
add simple test cases like the above to nco_tst.sh
adds documentation of feature to manual.
Thanks!
Charlie
min/max/ttl now appear to be working.
I added test cases to nco_tst.sh.
They still need to be documented, though.
Please stress test this new feature everybody.