Thread: [Gnuplot-info] problem with 'fit' when xdata set to date/time

A portable, multi-platform, command-line driven graphing utility

Brought to you by: broeker, cgaylord, lhecking, sfeam

gnuplot-info

[Gnuplot-info] problem with 'fit' when xdata set to date/time

From: DU <des...@gm...> - 2004-12-12 21:14:39

I'm not subscribed to the list, so please cc me directly.

I have attached a file named "test.dat" containing sample data.  One
column is a date, the other column is a value.  I want to plot these
values and also a linear fit to the data.  To do this, I created
"test.plt", also attached.

When I ran this with a copy of test.dat where the date field was a
simple number, I got the expected result (a negatively sloped line). 
When I use the date field, I get a roughly horizontal line, even
though I have used the "set xdata time" and set the timefmt.  I get
the same outcome with both gnuplot 3.7 and 4.0.

All the options and commands I'm using are in the file, nothing else
is on the command line.

I must be doing something wrong--but what?

Re: [Gnuplot-info] problem with 'fit' when xdata set to date/time

From: Hans-Bernhard B. <br...@ph...> - 2004-12-13 13:58:35

DU wrote:
> I'm not subscribed to the list, so please cc me directly.
> 
> I have attached a file named "test.dat" containing sample data.  One
> column is a date, the other column is a value.  I want to plot these
> values and also a linear fit to the data.  To do this, I created
> "test.plt", also attached.
> 
> When I ran this with a copy of test.dat where the date field was a
> simple number, I got the expected result (a negatively sloped line). 
> When I use the date field, I get a roughly horizontal line, even
> though I have used the "set xdata time" and set the timefmt.  I get
> the same outcome with both gnuplot 3.7 and 4.0.
> 
> All the options and commands I'm using are in the file, nothing else
> is on the command line.
> 
> I must be doing something wrong--but what?

The problem is that time values are counted in seconds since the 
millennium.  This makes the date entries you gave here huge (roughly 
1.5e8, as you can see in the mouse feedback), and causes your two fit 
parameters m and b, in the actual solution, to be of vastly different 
order of magnitude.  You also have an enormous lever arm from the data 
region (10 days) to extrapolate back to 'time == zero' (1000 days ago), 
which makes it excessively hard to get a correct value for 'b'. 
Combined these make for a rather ill-conditioned fitting problem.

You have to re-scale m (or, effectively, x) to get fit to work. If you 
change your function to

	f(x)=m*x/1e8 + b

the script starts to work:

degrees of freedom (ndf) : 8
rms of residuals      (stdfit) = sqrt(WSSR/ndf)      : 1.4812
variance of residuals (reduced chisquare) = WSSR/ndf : 2.19394

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

m               = -953.984         +/- 188.7        (19.78%)
b               = 1489.85          +/- 293.6        (19.71%)

correlation matrix of the fit parameters:

                m      b
m               1.000
b              -1.000  1.000

Back into your original parameters, this gives b ~= 1.5e3, m ~= -9e-6. 
I.e. they're 9 orders of magnitudes apart.  That can't work, particulary 
not without startup parameters.  The 'fit' documentation describes this, 
too; it even mentions the above method of fixing this (see 'help fit tips').

But this still isn't a good fit.  For two reasons:

1) Chisquare/ndf is 2.2.  That's too much for a typical good fit.

2) the enormous absolute errors on m and b.  They're still smaller than 
the fitted absolute values (i.e. relative errors are < 100 %), but the 
error on 'b' is a factor of one hundred larger than the variation of 
your actual y values.  That's a strong hint something's very wrong here.
This 'b' is, at best, a wild guess.

A better result can be achieved by not just rescaling, but also 
*shifting* x, so 'b' is taken at a point much closer to the actual data 
point.  E.g.:

	f(x)=m*(x/1e8-1.55)+b

gives a much nicer result for 'b':

m               = -953.984         +/- 188.7        (19.78%)
b               = 11.1728          +/- 1.162        (10.4%)

Actually, even shifting the x axis alone

	f(x)=m*(x-1.55e8)+b

is enough to get a working fit:

m               = -9.53984e-006    +/- 1.887e-006   (19.78%)
b               = 11.1728          +/- 1.162        (10.4%)