I would appreciate review of the code that yields mean curve datapoints, namely: DescStats.createMeanObservationForRange()
The AAVSO web-based light curve generator (LCG) generates significantly different (1 sigma) error bars that VStar. Why? Take, for example the last 200 days (from today) of Eps Aur in LCG and VStar.
David Benn
2010-02-02
Some questions:
o Does LCG use Standard Error of the Average calcs as per http://www.aavso.org/education/vsa/Chapter10.pdf?
o Does LCG use both V and Visual bands to calculate the mean curve datapoints and so std error of the average values?
David Benn
2010-02-02
The source file in question is here: http://vstar.svn.sourceforge.net/viewvc/vstar/trunk/src/org/aavso/tools/vstar/util/stats/DescStats.java?view=log
Aaron Price
2010-02-03
The lcg is supposed to show stdev based on the data in the bin. The mean curve can be either V or visual, the user gets to select which one in the interface. It's entirely possible my code is wrong too.
David Benn
2010-02-03
It would be great to look at this in depth when time permits. Thanks Aaron.
David Benn
2010-02-20
Apart from asking the question: is the algorithm/approach correct, it would also be worth coming up with some simple test data that we can use to compare VStar against LCG.
David Benn
2010-03-24
Matt provided this via email:
Sara communicated with Matt who provided this information:
Here is what I would do, written in natural language:
1) Choose a bin size (e.g. 7 days)
2) Place the center of the first bin at a time (binsize/2) from the first data point (i.e. if the first data point is 2455200.0, put the bin center at 2455203.5).
3) Sum all of the magnitudes in the bin to find the average magnitude, AND sum all of the jds to find the average time.
4) Compute the variance of the magnitudes in each bin using
VAR(i) = Sum (mag(j)-avemag(i))^2
5) Divide the variance of each bin by the number of data points in that bin, minus 1 (N-1) (i.e. Var(i)/(N(i)-1) )
6) Take the square root of that value, to get the std. deviation:
sigma(i) = sqrt (Var(i)/(N(i)-1))
7) Repeat, shifting the bin center one bin width up in time each time (e.g. 2455203.5, 2455210.5, etc.)
The averaged light curve should then be JD(ave),mag(ave), and sigma for each bin.
David Benn
2010-03-24
I need to fix the 2nd part of step 3. Currently, oddly, I take the first and last JD in the bin and take the average of these two.
An additional point of departure is that I go on to calculate the "standard error of the average" as described in HoA ch 10 and use *that* as my +/- error value for the bin.
See discussion of this here also:
https://sourceforge.net/apps/mediawiki/vstar/index.php?title=Talk:Computing_Means
Now, that would result in mean curves having:
a. non 1-sigma error bars
b. much smaller error bars
So... Are we saying that I should stop at SD and just use that as my +/- error bar value? If so, it appears that I have abused/misunderstood the role of the "standard error of the average" in this context.
David Benn
2010-03-24
David Benn
2010-03-26
The latest from Matt is that standard error of the average is suitable here. Further input (e.g. from Aaron, Mike, others) would be beneficial.
Aaron Price
2010-03-27
The debate between stdev and SE is philosophical with no right answer. My understanding is that when you have a large sample size stdev is preferred and when you have smaller samples, SE is preferred. I think the only important thing is that the display clearly describe whether it is stdev or SE.
I originally meant for the LCG to show stdev, but maybe I messed up and coded SE. Don't use it as a test. (I'll look at the code for it when we roll out the new web site.)
Regarding this question. I'd always go with Matt's suggestion over anyone else's. As he likes to say: "Math is hard!"