From: DU <des...@gm...> - 2004-12-12 21:14:39
|
I'm not subscribed to the list, so please cc me directly. I have attached a file named "test.dat" containing sample data. One column is a date, the other column is a value. I want to plot these values and also a linear fit to the data. To do this, I created "test.plt", also attached. When I ran this with a copy of test.dat where the date field was a simple number, I got the expected result (a negatively sloped line). When I use the date field, I get a roughly horizontal line, even though I have used the "set xdata time" and set the timefmt. I get the same outcome with both gnuplot 3.7 and 4.0. All the options and commands I'm using are in the file, nothing else is on the command line. I must be doing something wrong--but what? |
From: Hans-Bernhard B. <br...@ph...> - 2004-12-13 13:58:35
|
DU wrote: > I'm not subscribed to the list, so please cc me directly. > > I have attached a file named "test.dat" containing sample data. One > column is a date, the other column is a value. I want to plot these > values and also a linear fit to the data. To do this, I created > "test.plt", also attached. > > When I ran this with a copy of test.dat where the date field was a > simple number, I got the expected result (a negatively sloped line). > When I use the date field, I get a roughly horizontal line, even > though I have used the "set xdata time" and set the timefmt. I get > the same outcome with both gnuplot 3.7 and 4.0. > > All the options and commands I'm using are in the file, nothing else > is on the command line. > > I must be doing something wrong--but what? The problem is that time values are counted in seconds since the millennium. This makes the date entries you gave here huge (roughly 1.5e8, as you can see in the mouse feedback), and causes your two fit parameters m and b, in the actual solution, to be of vastly different order of magnitude. You also have an enormous lever arm from the data region (10 days) to extrapolate back to 'time == zero' (1000 days ago), which makes it excessively hard to get a correct value for 'b'. Combined these make for a rather ill-conditioned fitting problem. You have to re-scale m (or, effectively, x) to get fit to work. If you change your function to f(x)=m*x/1e8 + b the script starts to work: degrees of freedom (ndf) : 8 rms of residuals (stdfit) = sqrt(WSSR/ndf) : 1.4812 variance of residuals (reduced chisquare) = WSSR/ndf : 2.19394 Final set of parameters Asymptotic Standard Error ======================= ========================== m = -953.984 +/- 188.7 (19.78%) b = 1489.85 +/- 293.6 (19.71%) correlation matrix of the fit parameters: m b m 1.000 b -1.000 1.000 Back into your original parameters, this gives b ~= 1.5e3, m ~= -9e-6. I.e. they're 9 orders of magnitudes apart. That can't work, particulary not without startup parameters. The 'fit' documentation describes this, too; it even mentions the above method of fixing this (see 'help fit tips'). But this still isn't a good fit. For two reasons: 1) Chisquare/ndf is 2.2. That's too much for a typical good fit. 2) the enormous absolute errors on m and b. They're still smaller than the fitted absolute values (i.e. relative errors are < 100 %), but the error on 'b' is a factor of one hundred larger than the variation of your actual y values. That's a strong hint something's very wrong here. This 'b' is, at best, a wild guess. A better result can be achieved by not just rescaling, but also *shifting* x, so 'b' is taken at a point much closer to the actual data point. E.g.: f(x)=m*(x/1e8-1.55)+b gives a much nicer result for 'b': m = -953.984 +/- 188.7 (19.78%) b = 11.1728 +/- 1.162 (10.4%) Actually, even shifting the x axis alone f(x)=m*(x-1.55e8)+b is enough to get a working fit: m = -9.53984e-006 +/- 1.887e-006 (19.78%) b = 11.1728 +/- 1.162 (10.4%) |