Date: 2008-06-30 00:21:22 -0700 (Mon, 30 Jun 2008)
fix problem computing nan_sig; improve docu; fixed problem with confidence intervals
--- trunk/octave-forge/extra/NaN/inst/corrcoef.m 2008-06-28 17:14:22 UTC (rev 5150)
+++ trunk/octave-forge/extra/NaN/inst/corrcoef.m 2008-06-30 07:21:22 UTC (rev 5151)
@@ -27,12 +27,13 @@
% R(i,j) is the correlation coefficient r between X(:,i) and Y(:,j)
% p gives the significance of R
% It tests the null hypothesis that the product moment correlation coefficient is zero
-% using Student's t-test on the statistic t = r sqrt(N-2)/sqrt(1-r^2)
+% using Student's t-test on the statistic t = r*sqrt(N-2)/sqrt(1-r^2)
% where N is the number of samples (Statistics, M. Spiegel, Schaum series).
% p > alpha: do not reject the Null hypothesis: "R is zero".
-% p < alpha: The alternative hypothesis "R2 is larger than zero" is true with probability (1-alpha).
-% ci1 lower 0.95 confidence interval
-% ci2 upper 0.95 confidence interval
+% p < alpha: The alternative hypothesis "R is larger than zero" is true with probability (1-alpha).
+% ci1 lower (1-alpha) confidence interval
+% ci2 upper (1-alpha) confidence interval
+% The default alpha is 0.01, and can be changed with function flag_implicit_significance.
% nan_sig p-value whether H0: "NaN's are not correlated" could be correct
% if nan_sig < alpha, H1 ("NaNs are correlated") is very likely.
@@ -40,21 +41,22 @@
% order to avoid this pitfall, the correlation of NaN's should be checked
% or case-wise deletion should be applied.
% Case-Wise deletion can be implemented
-% ix = ~any(isnan([X,Y]))
-% [...] = CORRCOEF(X(ix,:),Y(ix,:),...)
+% ix = ~any(isnan([X,Y]),2);
+% [...] = CORRCOEF(X(ix,:),Y(ix,:),...);
% Correlation (non-random distribution) of NaN's can be checked with
% or [nan_R,nan_sig]=corrcoef([X,Y],isnan([X,Y]))
% or [R,p,ci1,ci2] = CORRCOEF(...);
-% Further recommandation related to the correlation coefficient
-% + LOOK AT THE SCATTERPLOTS!
-% + Correlation is not causation. The observed correlation between two variables
-% might be due to the action of other, unobserved variables.
+% Further recommandation related to the correlation coefficient:
+% + LOOK AT THE SCATTERPLOTS to make sure that the relationship is linear
+% + Correlation is not causation because
+% it is not clear which parameter is "cause" and which is "effect" and
+% the observed correlation between two variables might be due to the action of other, unobserved variables.
% see also: SUMSKIPNAN, COVM, COV, COR, SPEARMAN, RANKCORR, RANKS,
+% PARTCORRCOEF, flag_implicit_significance
% on the correlation coefficient
@@ -69,19 +71,18 @@
%  http://www.janda.org/c10/Lectures/topic06/L24-significanceR.htm
%  http://faculty.vassar.edu/lowry/ch4apx.html
%  http://davidmlane.com/hyperstat/B134689.html
-%  http://www.statsoft.com/textbook/stbasic.html#Correlationsk
+%  http://www.statsoft.com/textbook/stbasic.html#Correlations
%  http://www.tufts.edu/~gdallal/corr.htm
-% Copyright (C) 2000-2004 by Alois Schloegl <a.schloegl@...>
+% Copyright (C) 2000-2004,2008 by Alois Schloegl <a.schloegl@...>
% This function is part of the NaN-toolbox
-% This program is free software; you can redistribute it and/or modify
+% This program is free software: you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
-% the Free Software Foundation; either version 2 of the License, or
+% the Free Software Foundation, either version 3 of the License, or
% (at your option) any later version.
% This program is distributed in the hope that it will be useful,
@@ -90,7 +91,7 @@
% GNU General Public License for more details.
% You should have received a copy of the GNU General Public License
-% along with this program; If not, see <http://www.gnu.org/licenses/>.
+% along with this program. If not, see <http://www.gnu.org/licenses/>.
% + handles missing values (encoded as NaN's)
@@ -104,7 +105,7 @@
% + significance test for null-hypthesis: r=0
% + confidence interval included
% - rank correlation works for cell arrays, too (no check for missing values).
-% + compatible with this software and Matlab
+% + compatible with Octave and Matlab
NARG = nargout; % needed because nargout is not reentrant in Octave
@@ -115,7 +116,7 @@
fprintf(2,'Error CORRCOEF: Missing argument(s)\n');
- if ~isnumeric(Y)
+ if ischar(Y)
@@ -291,7 +292,7 @@
%tmp(ix1 | ix2) = nan; % avoid division-by-zero warning
z = log((1+tmp)./(1-tmp))/2; % Fisher's z-transform;
%sz = 1./sqrt(NN-3); % standard error of z
-sz = sqrt(2)*erfinv(1-2*alpha)./sqrt(NN-3); % confidence interval for alpha of z
+sz = sqrt(2)*erfinv(1-alpha)./sqrt(NN-3); % confidence interval for alpha of z
ci1 = tanh(z-sz);
ci2 = tanh(z+sz);
@@ -299,16 +300,15 @@
%ci1(isnan(ci1))=R(isnan(ci1)); % in case of isnan(ci), the interval limits are exactly the R value
-if (NARG<5) | ~YESNAN,
- sig_nan = ;
+if (NARG<5) || ~YESNAN,
+ nan_sig = repmat(NaN,size(R));
warning(FLAG_WARNING); % restore warning status
%%%%% ----- check independence of NaNs (missing values) -----
-%[nan_R,nan_sig,nan_ci1,nan_ci2] = corrcoef(X,isnan(X))
-[nan_R, nan_sig] = corrcoef(X,isnan(X));
+[nan_R, nan_sig] = corrcoef(X,(isnan(X)));
% remove diagonal elements, because these have not any meaning %
nan_sig(isnan(nan_R)) = nan;
@@ -317,8 +317,8 @@
tmp = nan_sig(:); % Hack to skip NaN's in MIN(X)
min_sig = min(tmp(~isnan(tmp))); % Necessary, because Octave returns NaN rather than min(X) for min(NaN,X)
fprintf(1,'CORRCOFF Warning: Missing Values (i.e. NaNs) are not independent of data (p-value=%f)\n', min_sig);
- fprintf(1,' Its recommended to remove all samples with any missing value (NaN).\n');
- fprintf(1,' In the following combinations the null-hypotheses (NaNs are uncorrelated) must be rejected.\n');
+ fprintf(1,' Its recommended to remove all samples (i.e. rows) with any missing value (NaN).\n');
+ fprintf(1,' The null-hypotheses (NaNs are uncorrelated) is rejected for the following parameter pair(s).\n');
[ix,iy] = find(nan_sig < alpha);
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.