From: <plotter@pi...>  20121017 20:46:52

Hi, I was just trying to do a fit command on some data but excluding a certain range where the data is perturbed by something else. I ttied an analogous method to using NaN in plot where points evaluating to NaN get ignored. However it seems that with fit they are counted which basically screws things up entirely. fit [:2001] ts_model(x) datafile using 1:2 via x1,a1,p1, x2,a2,p2,a3,p3,x3 After 1 iterations the fit converged. final sum of squares of residuals : nan abs. change during last iteration : nan degrees of freedom (FIT_NDF) : 169 rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : nan variance of residuals (reduced chisquare) = WSSR/ndf : nan Final set of parameters Asymptotic Standard Error ======================= ========================== x1 = nan +/ nan (nan%) a1 = 0.01 +/ nan (nan%) p1 = 3 +/ nan (nan%) x2 = nan +/ nan (nan%) a2 = 0.01 +/ nan (nan%) p2 = 20 +/ nan (nan%) a3 = 0.01 +/ nan (nan%) p3 = 60 +/ nan (nan%) x3 = 2006.12 +/ nan (nan%) correlation matrix of the fit parameters: x1 a1 p1 x2 a2 p2 a3 p3 x3 x1 nan a1 nan nan p1 nan nan nan x2 nan nan nan nan a2 nan nan nan nan nan p2 nan nan nan nan nan nan a3 nan nan nan nan nan nan nan p3 nan nan nan nan nan nan nan nan x3 nan nan nan nan nan nan nan nan nan gnuplot> fit [:2001] ts_model(x) datafile using 1:((($1<1980)&&($1>1984))?$2:NaN) via x1,a1,p1, x2,a2,p2,a3,p3,x3 Now I can see how this would be happening but I wonder if it has any use at all as a result. In practice this means that any dataset that contains even one NaN coordinate or one point where 'using' evaluates to NaN, cannot be used with the fit command. Now since plot and fit generally try to work in a parallel fashion, I was expecting fit to _exclude_ from its calculations the points that I had made evaluate to NaN. If this worked , it would be fundamentally useful. Is there any reason why the current behaviour is useful or essential. Would it be preferable to simply skip data points with a NaN in a similar way to what plot does? best regards, Peter. 
From: Ethan A Merritt <sfeam@us...>  20121017 21:08:09

On Wednesday, October 17, 2012 01:29:46 pm plotter@... wrote: > Hi, > > I was just trying to do a fit command on some data but excluding a > certain range where the data is perturbed by something else. > > I ttied an analogous method to using NaN in plot where points evaluating > to NaN get ignored. However it seems that with fit they are counted > which basically screws things up entirely. The plot commands and the fit command use exactly the same data input routine df_readline(), and I can see in the source that fit.c skips any points that return DF_UNDEFINED from df_readline(). So there must be more to it than that. On the other hand, there were indeed a few changes in the development version related to how NaN values are handled on input. Is it possible for you to check whether you get the same result in 4.6.0 and current CVS? Ethan > > > fit [:2001] ts_model(x) datafile using 1:2 via x1,a1,p1, x2,a2,p2,a3,p3,x3 > > > After 1 iterations the fit converged. > final sum of squares of residuals : nan > abs. change during last iteration : nan > > degrees of freedom (FIT_NDF) : 169 > rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : nan > variance of residuals (reduced chisquare) = WSSR/ndf : nan > > Final set of parameters Asymptotic Standard Error > ======================= ========================== > > x1 = nan +/ nan (nan%) > a1 = 0.01 +/ nan (nan%) > p1 = 3 +/ nan (nan%) > x2 = nan +/ nan (nan%) > a2 = 0.01 +/ nan (nan%) > p2 = 20 +/ nan (nan%) > a3 = 0.01 +/ nan (nan%) > p3 = 60 +/ nan (nan%) > x3 = 2006.12 +/ nan (nan%) > > > correlation matrix of the fit parameters: > > x1 a1 p1 x2 a2 p2 a3 p3 > x3 > x1 nan > a1 nan nan > p1 nan nan nan > x2 nan nan nan nan > a2 nan nan nan nan nan > p2 nan nan nan nan nan nan > a3 nan nan nan nan nan nan nan > p3 nan nan nan nan nan nan nan nan > x3 nan nan nan nan nan nan nan nan > nan > gnuplot> fit [:2001] ts_model(x) datafile using > 1:((($1<1980)&&($1>1984))?$2:NaN) via x1,a1,p1, x2,a2,p2,a3,p3,x3 > > > > > Now I can see how this would be happening but I wonder if it has any use > at all as a result. > > In practice this means that any dataset that contains even one NaN > coordinate or one point where 'using' evaluates to NaN, cannot be used > with the fit command. > > Now since plot and fit generally try to work in a parallel fashion, I > was expecting fit to _exclude_ from its calculations the points that I > had made evaluate to NaN. > > If this worked , it would be fundamentally useful. Is there any reason > why the current behaviour is useful or essential. > > Would it be preferable to simply skip data points with a NaN in a > similar way to what plot does? > > best regards, Peter. > > > > > > > >  > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > gnuplotbeta mailing list > gnuplotbeta@... > https://lists.sourceforge.net/lists/listinfo/gnuplotbeta > 
From: <plotter@pi...>  20121017 23:24:04

On 10/17/12 23:07, Ethan A Merritt wrote: > On Wednesday, October 17, 2012 01:29:46 pm plotter@... wrote: >> Hi, >> >> I was just trying to do a fit command on some data but excluding a >> certain range where the data is perturbed by something else. >> >> I ttied an analogous method to using NaN in plot where points evaluating >> to NaN get ignored. However it seems that with fit they are counted >> which basically screws things up entirely. > > The plot commands and the fit command use exactly the same data input > routine df_readline(), and I can see in the source that fit.c skips any > points that return DF_UNDEFINED from df_readline(). > So there must be more to it than that. > > On the other hand, there were indeed a few changes in the development > version related to how NaN values are handled on input. > Is it possible for you to check whether you get the same result in > 4.6.0 and current CVS? > > Ethan > >> >> >> fit [:2001] ts_model(x) datafile using 1:2 via x1,a1,p1, x2,a2,p2,a3,p3,x3 >> >> >> After 1 iterations the fit converged. >> final sum of squares of residuals : nan >> abs. change during last iteration : nan >> >> degrees of freedom (FIT_NDF) : 169 >> rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : nan >> variance of residuals (reduced chisquare) = WSSR/ndf : nan >> >> Final set of parameters Asymptotic Standard Error >> ======================= ========================== >> >> x1 = nan +/ nan (nan%) >> a1 = 0.01 +/ nan (nan%) >> p1 = 3 +/ nan (nan%) >> x2 = nan +/ nan (nan%) >> a2 = 0.01 +/ nan (nan%) >> p2 = 20 +/ nan (nan%) >> a3 = 0.01 +/ nan (nan%) >> p3 = 60 +/ nan (nan%) >> x3 = 2006.12 +/ nan (nan%) >> >> >> correlation matrix of the fit parameters: >> >> x1 a1 p1 x2 a2 p2 a3 p3 >> x3 >> x1 nan >> a1 nan nan >> p1 nan nan nan >> x2 nan nan nan nan >> a2 nan nan nan nan nan >> p2 nan nan nan nan nan nan >> a3 nan nan nan nan nan nan nan >> p3 nan nan nan nan nan nan nan nan >> x3 nan nan nan nan nan nan nan nan >> nan >> gnuplot> fit [:2001] ts_model(x) datafile using >> 1:((($1<1980)&&($1>1984))?$2:NaN) via x1,a1,p1, x2,a2,p2,a3,p3,x3 >> >> >> >> >> Now I can see how this would be happening but I wonder if it has any use >> at all as a result. >> >> In practice this means that any dataset that contains even one NaN >> coordinate or one point where 'using' evaluates to NaN, cannot be used >> with the fit command. >> >> Now since plot and fit generally try to work in a parallel fashion, I >> was expecting fit to _exclude_ from its calculations the points that I >> had made evaluate to NaN. >> >> If this worked , it would be fundamentally useful. Is there any reason >> why the current behaviour is useful or essential. >> >> Would it be preferable to simply skip data points with a NaN in a >> similar way to what plot does? >> >> best regards, Peter. >> Hi Ethan, this was found on the following CVS build: G N U P L O T Version 4.7 patchlevel 0 last modified 20120302 Build System: Linux i686 I've trimmed it down to a test case that throws the error, without the conditional in the using clause it works as expected. One thing that seems odd is: > After 1 iterations the fit converged. > final sum of squares of residuals : nan Firstly what is "nan" ?! Second , why did it return a result that it converged when the presence of NaN (or nan) would be expected to be a non result rather than convergence? Is nan perhaps zero ! Does that help? /Peter gnuplot> fit a*cos(x) "" using 1:(($1>3)?$2:NaN) via a input data ('e' ends) > 1 1 input data ('e' ends) > 2 2 input data ('e' ends) > 3 1 input data ('e' ends) > 4 4 input data ('e' ends) > e Iteration 0 WSSR : nan delta(WSSR)/WSSR : 0 delta(WSSR) : nan limit for stopping : 1e05 lambda : 0.684186 initial set of free parameter values a = 0.887069 ********************* After 1 iterations the fit converged. final sum of squares of residuals : nan abs. change during last iteration : nan degrees of freedom (FIT_NDF) : 3 rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : nan variance of residuals (reduced chisquare) = WSSR/ndf : nan Final set of parameters Asymptotic Standard Error ======================= ========================== a = 0.887069 +/ nan (nan%) correlation matrix of the fit parameters: a a 1.000 