From: <nir...@us...> - 2012-05-24 20:29:33
|
Revision: 10519 http://octave.svn.sourceforge.net/octave/?rev=10519&view=rev Author: nir-krakauer Date: 2012-05-24 20:29:27 +0000 (Thu, 24 May 2012) Log Message: ----------- modified cspas_sel to work correctly for unequally weighted points and use a more efficient method when n is large Modified Paths: -------------- trunk/octave-forge/main/splines/inst/csaps_sel.m Modified: trunk/octave-forge/main/splines/inst/csaps_sel.m =================================================================== --- trunk/octave-forge/main/splines/inst/csaps_sel.m 2012-05-24 20:28:58 UTC (rev 10518) +++ trunk/octave-forge/main/splines/inst/csaps_sel.m 2012-05-24 20:29:27 UTC (rev 10519) @@ -27,7 +27,7 @@ ## ## returns the selected @var{p}, the estimated data scatter (variance from the smooth trend) @var{sigma2}, and the estimated uncertainty (SD) of the smoothing spline fit at each @var{x} value, @var{unc_y}. ## -## Note: The current evaluation of the effective number of model parameters uses singular value decomposition of an @var{n} by @var{n} matrix and is not computation or storage efficient for large @var{n} (thousands or greater). See Hutchinson (1985) for a more efficient method. +## The optimization uses singular value decomposition of an @var{n} by @var{n} matrix for small @var{n} in order to quickly compute the residual size and model degrees of freedom for many @var{p} values for the optimization (Craven and Wahba 1979), while for large @var{n} (currently >300) an asymptotically more computation and storage efficient method that takes advantage of the sparsity of the problem's coefficient matrices is used (Hutchinson and de Hoog 1985). ## ## References: ## @@ -82,27 +82,38 @@ R = spdiags([h(1:end-1) 2*(h(1:end-1) + h(2:end)) h(2:end)], [-1 0 1], n-2, n-2); QT = spdiags([1 ./ h(1:end-1) -(1 ./ h(1:end-1) + 1 ./ h(2:end)) 1 ./ h(2:end)], [0 1 2], n-2, n); - -##determine influence matrix for different p without repeated inversion - [U D V] = svd(diag(1 ./ sqrt(w))*QT'*sqrtm(inv(R)), 0); D = diag(D).^2; +chol_method = (n > 300); #use a sparse Cholesky decomposition followed by solving for only the central bands of the inverse to solve for large n (faster), and singular value decomposition for small n (less prone to numerical error if data values are spaced close together) + ##choose p by minimizing the penalty function + +if chol_method + penalty_function = @(p) penalty_compute_chol(p, QT, R, y, w, n, crit); +else + ##determine influence matrix for different p without repeated inversion + [U D V] = svd(diag(1 ./ sqrt(w))*QT'*sqrtm(inv(R)), 0); D = diag(D).^2; penalty_function = @(p) penalty_compute(p, U, D, y, w, n, crit); +end p = fminbnd(penalty_function, 0, 1); +## estimate the trend uncertainty +if chol_method + [MSR, Ht] = penalty_terms_chol(p, QT, R, y, w, n); + Hd = influence_matrix_diag_chol(p, QT, R, y, w, n); +else + H = influence_matrix(p, U, D, n, w); + [MSR, Ht] = penalty_terms(H, y, w); + Hd = diag(H); +end - - H = influence_matrix(p, U, D, n); - [MSR, Ht] = penalty_terms(H, y, w); sigma2 = MSR * (n / (n-Ht)); #estimated data error variance (wahba83) - unc_y = sqrt(sigma2 * diag(H) ./ w); #uncertainty (SD) of fitted curve at each input x-value (hutchinson86) - + unc_y = sqrt(sigma2 * Hd ./ w); #uncertainty (SD) of fitted curve at each input x-value (hutchinson86) ## solve for the scaled second derivatives u and for the function values a at the knots (if p = 1, a = y) u = (6*(1-p)*QT*diag(1 ./ w)*QT' + p*R) \ (QT*y); - a = y - 6*(1-p)*diag(1 ./ w)*QT'*u; + a = y - 6*(1-p)*diag(1 ./ w)*QT'*u; #where y is calculated ## derivatives at all but the last knot for the piecewise cubic spline aa = a(1:(end-1), :); @@ -122,15 +133,38 @@ -function H = influence_matrix(p, U, D, n) #returns influence matrix for given p +function H = influence_matrix(p, U, D, n, w) #returns influence matrix for given p H = speye(n) - U * diag(D ./ (D + (p / (6*(1-p))))) * U'; + H = diag(1 ./ sqrt(w)) * H * diag(sqrt(w)); #rescale to original units endfunction function [MSR, Ht] = penalty_terms(H, y, w) - MSR = mean(w .* (y - H*y) .^ 2); #mean square residual + MSR = mean(w .* (y - (H*y)) .^ 2); #mean square residual Ht = trace(H); #effective number of fitted parameters endfunction +function Hd = influence_matrix_diag_chol(p, QT, R, y, w, n) + #LDL factorization of 6*(1-p)*QT*diag(1 ./ w)*QT' + p*R + U = chol(6*(1-p)*QT*diag(1 ./ w)*QT' + p*R, 'upper'); + d = 1 ./ diag(U); + U = diag(d)*U; + d = d .^ 2; + #5 central bands in the inverse of 6*(1-p)*QT*diag(1 ./ w)*QT' + p*R + Binv = banded_matrix_inverse(d, U, 2); + Hd = diag(speye(n) - (6*(1-p))*diag(1 ./ w)*QT'*Binv*QT); +endfunction + +function [MSR, Ht] = penalty_terms_chol(p, QT, R, y, w, n) + #LDL factorization of 6*(1-p)*QT*diag(1 ./ w)*QT' + p*R + U = chol(6*(1-p)*QT*diag(1 ./ w)*QT' + p*R, 'upper'); + d = 1 ./ diag(U); + U = diag(d)*U; + d = d .^ 2; + Binv = banded_matrix_inverse(d, U, 2); #5 central bands in the inverse of 6*(1-p)*QT*diag(1 ./ w)*QT' + p*R + Ht = 2 + trace(speye(n-2) - (6*(1-p))*QT*diag(1 ./ w)*QT'*Binv); + MSR = mean(w .* ((6*(1-p)*diag(1 ./ w)*QT'*((6*(1-p)*QT*diag(1 ./ w)*QT' + p*R) \ (QT*y)))) .^ 2); +endfunction + function J = aicc(MSR, Ht, n) J = mean(log(MSR)(:)) + 2 * (Ht + 1) / max(n - Ht - 2, 0); #hurvich98, taking the average if there are multiple data sets as in woltring86 endfunction @@ -144,7 +178,7 @@ endfunction function J = penalty_compute(p, U, D, y, w, n, crit) #evaluates a user-supplied penalty function at given p - H = influence_matrix(p, U, D, n); + H = influence_matrix(p, U, D, n, w); [MSR, Ht] = penalty_terms(H, y, w); J = feval(crit, MSR, Ht, n); if ~isfinite(J) @@ -152,7 +186,32 @@ endif endfunction +function J = penalty_compute_chol(p, QT, R, y, w, n, crit) #evaluates a user-supplied penalty function at given p + [MSR, Ht] = penalty_terms_chol(p, QT, R, y, w, n); + J = feval(crit, MSR, Ht, n); + if ~isfinite(J) + J = Inf; + endif +endfunction +function Binv = banded_matrix_inverse(d, U, m) #given a (2m+1)-banded, symmetric n x n matrix B = U'*inv(diag(d))*U, where U is unit upper triangular with bandwidth (m+1), returns Binv, a sparse symmetric matrix containing the central 2m+1 bands of the inverse of B +#Reference: Hutchinson and de Hoog 1985 + Binv = diag(d); + n = rows(U); + for i = n:(-1):1 + p = min(m, n - i); + for l = 1:p + for k = 1:p + Binv(i, i+l) -= U(i, i+k)*Binv(i + k, i + l); + end + Binv(i, i) -= U(i, i+l)*Binv(i, i+l); + end + Binv(i+(1:p), i) = Binv(i, i+(1:p))'; #add the lower triangular elements + end +endfunction + + + %!shared x,y,ret,p,sigma2,unc_y %! x = [0:0.01:1]'; y = sin(x); %! [ret,p,sigma2,unc_y]=csaps_sel(x,y,x); @@ -161,3 +220,21 @@ %!assert (ret-y, zeros(size(y)), 1E-4); %!assert (unc_y, zeros(size(unc_y)), 1E-5); +%{ +# experiments with unequal weighting of points +rand("seed", pi) +x = [0:0.01:1]'; +w = 1 ./ (0.1 + rand(size(x))); +rand("seed", pi+1) +y = x .^ 2 + 0.05*(rand(size(x))-0.5)./ sqrt(w); +[ret,p,sigma2,unc_y]=csaps_sel(x,y,x,w); + +rand("seed", pi) +x = [0:0.01:10]'; +w = 1 ./ (0.1 + rand(size(x))); +rand("seed", pi+1) +y = x .^ 2 + 0.5*(rand(size(x))-0.5)./ sqrt(w); +tic; [ret,p,sigma2,unc_y]=csaps_sel(x,y,x,w); toc + +%} + This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <nir...@us...> - 2012-05-29 21:54:45
|
Revision: 10534 http://octave.svn.sourceforge.net/octave/?rev=10534&view=rev Author: nir-krakauer Date: 2012-05-29 21:54:39 +0000 (Tue, 29 May 2012) Log Message: ----------- fixed cspas_sel to calculate trend uncertainty when the input y is an array Modified Paths: -------------- trunk/octave-forge/main/splines/inst/csaps_sel.m Modified: trunk/octave-forge/main/splines/inst/csaps_sel.m =================================================================== --- trunk/octave-forge/main/splines/inst/csaps_sel.m 2012-05-29 19:26:00 UTC (rev 10533) +++ trunk/octave-forge/main/splines/inst/csaps_sel.m 2012-05-29 21:54:39 UTC (rev 10534) @@ -108,7 +108,7 @@ Hd = diag(H); end - sigma2 = MSR * (n / (n-Ht)); #estimated data error variance (wahba83) + sigma2 = mean(MSR(:)) * (n / (n-Ht)); #estimated data error variance (wahba83) unc_y = sqrt(sigma2 * Hd ./ w); #uncertainty (SD) of fitted curve at each input x-value (hutchinson86) ## solve for the scaled second derivatives u and for the function values a at the knots (if p = 1, a = y) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <nir...@us...> - 2012-07-23 16:36:17
|
Revision: 10763 http://octave.svn.sourceforge.net/octave/?rev=10763&view=rev Author: nir-krakauer Date: 2012-07-23 16:36:06 +0000 (Mon, 23 Jul 2012) Log Message: ----------- added option of smoothing to a target residual size to csaps_sel Modified Paths: -------------- trunk/octave-forge/main/splines/inst/csaps_sel.m Modified: trunk/octave-forge/main/splines/inst/csaps_sel.m =================================================================== --- trunk/octave-forge/main/splines/inst/csaps_sel.m 2012-07-22 21:06:01 UTC (rev 10762) +++ trunk/octave-forge/main/splines/inst/csaps_sel.m 2012-07-23 16:36:06 UTC (rev 10763) @@ -18,14 +18,14 @@ ## @deftypefnx{Function File}{[@var{pp} @var{p} @var{sigma2},@var{unc_y}] =} csaps_sel(@var{x}, @var{y}, [], @var{w}=[], @var{crit}=[]) ## ## Cubic spline approximation with smoothing parameter estimation @* -## approximate [@var{x},@var{y}], weighted by @var{w} (inverse variance; if not given, equal weighting is assumed), at @var{xi}. +## Approximates [@var{x},@var{y}], weighted by @var{w} (inverse variance; if not given, equal weighting is assumed), at @var{xi}. ## ## The chosen cubic spline with natural boundary conditions @var{pp}(@var{x}) minimizes @var{p} Sum_i @var{w}_i*(@var{y}_i - @var{pp}(@var{x}_i))^2 + (1-@var{p}) Int @var{pp}''(@var{x}) d@var{x}. -## A selection criterion @var{crit} is used to find a suitable value for @var{p} (between 0 and 1); possible values for @var{crit} are `aicc' (corrected Akaike information criterion, the default); `aic' (original Akaike information criterion); `gcv' (generalized cross validation) +## A selection criterion @var{crit} is used to find a suitable value for @var{p} (between 0 and 1); possible values for @var{crit} are `aicc' (corrected Akaike information criterion, the default); `aic' (original Akaike information criterion); `gcv' (generalized cross validation). If @var{crit} is a scalar instead of a string, then @{p} is chosen to so that the mean square scaled residual Mean_i (@var{w}_i*(@var{y}_i - @var{pp}(@var{x}_i))^2) is approximately equal to @var{crit}. ## ## @var{x} and @var{w} should be @var{n} by 1 in size; @var{y} should be @var{n} by @var{m}; @var{xi} should be @var{k} by 1; the values in @var{x} should be distinct; the values in @var{w} should be nonzero. ## -## returns the selected @var{p}, the estimated data scatter (variance from the smooth trend) @var{sigma2}, and the estimated uncertainty (SD) of the smoothing spline fit at each @var{x} value, @var{unc_y}. +## Returns the selected @var{p}, the estimated data scatter (variance from the smooth trend) @var{sigma2}, and the estimated uncertainty (SD) of the smoothing spline fit at each @var{x} value, @var{unc_y}. ## ## The optimization uses singular value decomposition of an @var{n} by @var{n} matrix for small @var{n} in order to quickly compute the residual size and model degrees of freedom for many @var{p} values for the optimization (Craven and Wahba 1979), while for large @var{n} (currently >300) an asymptotically more computation and storage efficient method that takes advantage of the sparsity of the problem's coefficient matrices is used (Hutchinson and de Hoog 1985). ## @@ -73,6 +73,16 @@ w = ones(n, 1); end + if isscalar(crit) + if crit <= 0 #return an exact cubic spline interpolation + [ret,p]=csaps(x,y,1,xi,w); + sigma2 = 0; unc_y = zeros(size(x)); + return + end + w = w ./ crit; #adjust the sample weights so that the target mean square scaled residual is 1 + crit = 'msr_bound'; + end + if isempty(crit) crit = 'aicc'; end @@ -177,7 +187,11 @@ J = mean(log(MSR)(:)) - 2 * log(1 - Ht / n); endfunction -function J = penalty_compute(p, U, D, y, w, n, crit) #evaluates a user-supplied penalty function at given p +function J = msr_bound(MSR, Ht, n) + J = mean(MSR(:) - 1) .^ 2; +endfunction + +function J = penalty_compute(p, U, D, y, w, n, crit) #evaluates a user-supplied penalty function crit at given p H = influence_matrix(p, U, D, n, w); [MSR, Ht] = penalty_terms(H, y, w); J = feval(crit, MSR, Ht, n); @@ -186,7 +200,7 @@ endif endfunction -function J = penalty_compute_chol(p, QT, R, y, w, n, crit) #evaluates a user-supplied penalty function at given p +function J = penalty_compute_chol(p, QT, R, y, w, n, crit) #evaluates a user-supplied penalty function crit at given p [MSR, Ht] = penalty_terms_chol(p, QT, R, y, w, n); J = feval(crit, MSR, Ht, n); if ~isfinite(J) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <nir...@us...> - 2012-07-24 14:39:47
|
Revision: 10767 http://octave.svn.sourceforge.net/octave/?rev=10767&view=rev Author: nir-krakauer Date: 2012-07-24 14:39:41 +0000 (Tue, 24 Jul 2012) Log Message: ----------- fixed texinfo formatting of help text Modified Paths: -------------- trunk/octave-forge/main/splines/inst/csaps_sel.m Modified: trunk/octave-forge/main/splines/inst/csaps_sel.m =================================================================== --- trunk/octave-forge/main/splines/inst/csaps_sel.m 2012-07-24 14:08:03 UTC (rev 10766) +++ trunk/octave-forge/main/splines/inst/csaps_sel.m 2012-07-24 14:39:41 UTC (rev 10767) @@ -21,7 +21,7 @@ ## Approximates [@var{x},@var{y}], weighted by @var{w} (inverse variance; if not given, equal weighting is assumed), at @var{xi}. ## ## The chosen cubic spline with natural boundary conditions @var{pp}(@var{x}) minimizes @var{p} Sum_i @var{w}_i*(@var{y}_i - @var{pp}(@var{x}_i))^2 + (1-@var{p}) Int @var{pp}''(@var{x}) d@var{x}. -## A selection criterion @var{crit} is used to find a suitable value for @var{p} (between 0 and 1); possible values for @var{crit} are `aicc' (corrected Akaike information criterion, the default); `aic' (original Akaike information criterion); `gcv' (generalized cross validation). If @var{crit} is a scalar instead of a string, then @{p} is chosen to so that the mean square scaled residual Mean_i (@var{w}_i*(@var{y}_i - @var{pp}(@var{x}_i))^2) is approximately equal to @var{crit}. +## A selection criterion @var{crit} is used to find a suitable value for @var{p} (between 0 and 1); possible values for @var{crit} are `aicc' (corrected Akaike information criterion, the default); `aic' (original Akaike information criterion); `gcv' (generalized cross validation). If @var{crit} is a scalar instead of a string, then @var{p} is chosen to so that the mean square scaled residual Mean_i (@var{w}_i*(@var{y}_i - @var{pp}(@var{x}_i))^2) is approximately equal to @var{crit}. ## ## @var{x} and @var{w} should be @var{n} by 1 in size; @var{y} should be @var{n} by @var{m}; @var{xi} should be @var{k} by 1; the values in @var{x} should be distinct; the values in @var{w} should be nonzero. ## This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |