I was surprised by the amount of memory (i.e. RAM) needed by ncwa while computing statistics (e.g. average, rms, and so).
I went into the code of ncwa.c, and I found, if I'm not mistaken, that it does load at once all data in memory for one variable.
It is not mandatory to do so, as the program "averages" along degenerated dimension(s).
The variable could be loaded at each degenerated dimension(s), processed and stored in accumulation array (e.g. sum, weight, squares, ...).
So, it should be just a the loop add on the degenerate dimension(s).
exemple:
"in.nc" containts 4D variables (e.g. time,z,y,x), and both average, rms should be computed on along time (i.e. time is the degenerated dimension), I may have in the order of 10 000 samples.
ncwa -d time in.nc out.nc
First, I'm right on the read of ncwa code ?
If you think so, what do you think of this ?
And do you see any problem doing this changes in the code ?
Regards, Sebastien.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Because, using accumulation arrays for central moment of the different orders, it may be possible to compute all (i.e. more) at once (if less memory is used for input data). You can output average, rms, ... at once. That's what I may do in the code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It's impressive that you figured this out by reading the code rather than
the manual :)
Your method would indeed reduce peak memory usage for ncwa.
However, you should consider three more points before advocating this change:
speed, ncra, and complexity.
First, the more reads you require, the slower the application will be.
Reading all the data in a variable at once is fastest.
I'm not too worried that a single variable will be so large that it will
cause excessive swaps on a typical workstation today (> 1 GB RAM).
Second, ncra implements the algorithm you suggest in your example.
If you want to reduce a variable over only the record (time) dimension,
then use ncra. ncra reads in only one timestep at a time and thus has
much lower memory overhead.
> And do you see any problem doing this changes in the code ?
I think it would be much more complex than the current algorithm without
much noticeable benefit to the user. It would require placing both read
and write routines within loops over arbitrary numbers of dimensions not
known until runtime. However, if you submitted a clean patch to do it,
we would certainly consider it.
Best,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1/ concerning memory (first explain my data)
I use 'ncwa' as I want to not take into account a few outlier from my data to compute average, rms and I'd like also the thrid and the fourth central moments too (and at end cross moments).
I have a special variable that give me bad data (1 good, 0 bad), and I'm weighting my variable by this last one.
May be, I could do it with nco "missing_value", but I didn't read much on this, yet. What ever I don't think it will solve my memory problem. Do you have any ideas ?
2/ concerning memory
One variable is 850 MByte (9600*162*141 *4 Byte for float),
I'm working using a 64-bit machine and 2GByte RAM (and about no swap)
ncwa fail while allocating. It does allocate more than 1.4 GByte and stop as it does need more: ERROR malloc.
One variable is 145 MByte (1600*162*141 *4 Byte for float),
It does allocate more than 1.5 GByte. Run and computes all.
It's also working at last with 161 MBytes.
3/ ncra vs ncwa
So, now you may understand why I use ncwa instead off ncra.
But, you gave me some ideas: I may split my data in space, and do several ncwa.
4/ changing the code
It will do several reads, but it won't write in the loop, only one write at the end.
As you say, it may slowdown the read, but it will read it ! It doesn't now.
Regards, Sebastien.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> May be, I could do it with nco "missing_value", but I didn't read
> much on this, yet. What ever I don't think it will solve my memory
> problem. Do you have any ideas ?
Try using the missing_value to denote bad data.
The weight requires an additional 0.85--1.7 GB (depending on its shape).
> One variable is 850 MByte (9600*162*141 *4 Byte for float),
> I'm working using a 64-bit machine and 2GByte RAM (and about no swap)
Your variable is big. You deserve more RAM :)
> ncwa fail while allocating.
> It does allocate more than 1.4 GByte and stop as it does need more:
> ERROR malloc.
Given your hardware, this is expected.
> One variable is 145 MByte (1600*162*141 *4 Byte for float),
> It does allocate more than 1.5 GByte. Run and computes all.
> It's also working at last with 161 MBytes.
Again, as expected.
> 3/ ncra vs ncwa
> So, now you may understand why I use ncwa instead off ncra.
> But, you gave me some ideas: I may split my data in space, and do
> several ncwa.
Yes, that should work too.
> 4/ changing the code
> It will do several reads, but it won't write in the loop, only one
> write at the end.
If you put the write in the loop (as ncra does) then you reduce
the size of the output buffer.
> As you say, it may slowdown the read, but it will read it ! It
> doesn't now.
Right.
Good luck,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I was surprised by the amount of memory (i.e. RAM) needed by ncwa while computing statistics (e.g. average, rms, and so).
I went into the code of ncwa.c, and I found, if I'm not mistaken, that it does load at once all data in memory for one variable.
It is not mandatory to do so, as the program "averages" along degenerated dimension(s).
The variable could be loaded at each degenerated dimension(s), processed and stored in accumulation array (e.g. sum, weight, squares, ...).
So, it should be just a the loop add on the degenerate dimension(s).
exemple:
"in.nc" containts 4D variables (e.g. time,z,y,x), and both average, rms should be computed on along time (i.e. time is the degenerated dimension), I may have in the order of 10 000 samples.
ncwa -d time in.nc out.nc
First, I'm right on the read of ncwa code ?
If you think so, what do you think of this ?
And do you see any problem doing this changes in the code ?
Regards, Sebastien.
Hi,
complementary to the previous message
a maths reference:
http://mathworld.wolfram.com/CentralMoment.html
and a correction too on option call:
ncwa -a time in.nc out.nc
Regards, Sebastien.
Hi Sebastien,
Why did you point me to the CentralMoment page?
Thanks,
Charlie
Because, using accumulation arrays for central moment of the different orders, it may be possible to compute all (i.e. more) at once (if less memory is used for input data). You can output average, rms, ... at once. That's what I may do in the code.
Hello Sebastien,
> First, I'm right on the read of ncwa code ?
Your analysis of the ncwa memory usage is correct. For more information see
http://nco.sf.net/nco.html#mmr
> If you think so, what do you think of this ?
It's impressive that you figured this out by reading the code rather than
the manual :)
Your method would indeed reduce peak memory usage for ncwa.
However, you should consider three more points before advocating this change:
speed, ncra, and complexity.
First, the more reads you require, the slower the application will be.
Reading all the data in a variable at once is fastest.
I'm not too worried that a single variable will be so large that it will
cause excessive swaps on a typical workstation today (> 1 GB RAM).
Second, ncra implements the algorithm you suggest in your example.
If you want to reduce a variable over only the record (time) dimension,
then use ncra. ncra reads in only one timestep at a time and thus has
much lower memory overhead.
> And do you see any problem doing this changes in the code ?
I think it would be much more complex than the current algorithm without
much noticeable benefit to the user. It would require placing both read
and write routines within loops over arbitrary numbers of dimensions not
known until runtime. However, if you submitted a clean patch to do it,
we would certainly consider it.
Best,
Charlie
Thanks Charlie,
1/ concerning memory (first explain my data)
I use 'ncwa' as I want to not take into account a few outlier from my data to compute average, rms and I'd like also the thrid and the fourth central moments too (and at end cross moments).
I have a special variable that give me bad data (1 good, 0 bad), and I'm weighting my variable by this last one.
May be, I could do it with nco "missing_value", but I didn't read much on this, yet. What ever I don't think it will solve my memory problem. Do you have any ideas ?
2/ concerning memory
One variable is 850 MByte (9600*162*141 *4 Byte for float),
I'm working using a 64-bit machine and 2GByte RAM (and about no swap)
ncwa fail while allocating. It does allocate more than 1.4 GByte and stop as it does need more: ERROR malloc.
One variable is 145 MByte (1600*162*141 *4 Byte for float),
It does allocate more than 1.5 GByte. Run and computes all.
It's also working at last with 161 MBytes.
3/ ncra vs ncwa
So, now you may understand why I use ncwa instead off ncra.
But, you gave me some ideas: I may split my data in space, and do several ncwa.
4/ changing the code
It will do several reads, but it won't write in the loop, only one write at the end.
As you say, it may slowdown the read, but it will read it ! It doesn't now.
Regards, Sebastien.
Hi Sebastien,
> May be, I could do it with nco "missing_value", but I didn't read
> much on this, yet. What ever I don't think it will solve my memory
> problem. Do you have any ideas ?
Try using the missing_value to denote bad data.
The weight requires an additional 0.85--1.7 GB (depending on its shape).
> One variable is 850 MByte (9600*162*141 *4 Byte for float),
> I'm working using a 64-bit machine and 2GByte RAM (and about no swap)
Your variable is big. You deserve more RAM :)
> ncwa fail while allocating.
> It does allocate more than 1.4 GByte and stop as it does need more:
> ERROR malloc.
Given your hardware, this is expected.
> One variable is 145 MByte (1600*162*141 *4 Byte for float),
> It does allocate more than 1.5 GByte. Run and computes all.
> It's also working at last with 161 MBytes.
Again, as expected.
> 3/ ncra vs ncwa
> So, now you may understand why I use ncwa instead off ncra.
> But, you gave me some ideas: I may split my data in space, and do
> several ncwa.
Yes, that should work too.
> 4/ changing the code
> It will do several reads, but it won't write in the loop, only one
> write at the end.
If you put the write in the loop (as ncra does) then you reduce
the size of the output buffer.
> As you say, it may slowdown the read, but it will read it ! It
> doesn't now.
Right.
Good luck,
Charlie