From: D G. <de...@gn...> - 2005-12-21 17:27:48
|
My spearman.m in octave2.1's forge calculates spearman as rho = cor (ranks (x), ranks (y)); this was as seen in /usr/share/octave/2.1.69/m/statistics/base/spearman.m. Since then, I read here that the implementation has been changed. http://octave.sourceforge.net/index/f/spearman.html says that: For some (unknown) reason, in previous versions Spearman's rank correlation r = corrcoef(ranks(x)). But according to [1], Spearman's correlation is defined as r = 1-6*sum((ranks(x)-ranks(y)).^2)/(N*(N*N-1)) The results are different. Here, the later version is implemented. The 2 implementations should differ only when there are tied ranks. However, it seems to me that the older implementation was more logical. The logical definition of spearman is corr(ranks(x),ranks(y)), and the formula just happens to be what that reduces to when there are no tied ranks. Schaum's Probability and Statistics derives the spearman formula from corr(ranks(x),ranks(y)).. (but curiously, still uses the formula in an exercise even though there are tied ranks.) The following websites seem to address the issue of tied ranks, and suggest that the formula is not valid in case of ties. ---- http://faculty.vassar.edu/lowry/ch3b.html -- says that the formula is valid only for rankings without ties. http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient -- the wikipedia article also says the same: "The formula becomes more complicated in the presence of tied ranks, but unless the tie bands are large, the effect of ignoring them is small" http://www.netnam.vn/unescocourse/statistics/13_6.htm : When there are no ties, the formula for rs, reduces to .... where d is the difference between the values of x and y corresponding to a pair of observations. This simple formula will provide a good approximation to rs when the number of ties in the ranks is small. http://web.uccs.edu/lbecker/spss80/ctabs2.htm : says: CAUTION: Please note that the formula given above is inappropriate when there are tied ranks. Our example data has many ties. The rank order correlation computed by that formula is .513, whereas the correct value (given by SPSS) is .370. So how should the Spearman rank-order correlation be computed if there are ties? The Spearman correlation is a special case of the Pearson product-moment correlation. If you compute a Pearson product-moment correlation on the ranked data the result will be the correct value of the Spearman rank order correlation. http://www.nyu.edu/its/socsci/Docs/correlate.html even provides a correction. ---- It seems then, that the previous version was perhaps more accurate logically? It seems that some (most) sources say that corr(ranks(x), ranks(y)) is the correct definition, whereas others (like the last one) imply that the formula is the correct definition, but then ask us not to rely on the spearman formula in case of ties, but rather to use corr(ranks). In either case, perhaps it doesn't make sense for us to try to decide what is the true definition. Maybe we should keep the spearman implementition as is (i.e. use the formula), but to add a doc like this to the end of the docstring of spearman.m "Note that this function simply implements the Spearman formula. A more logical indicator, often regarded as the true definition[1,2,3,4,5] of Spearman Rank Correlation is corr(ranks(x),ranks(y)), so you might prefer to use that instead. The latter coincides with the Spearman formula when there are no ties among ranks, but differs when there are ties among ranks. For a small number of ties, the difference can be ignored. [1] http://faculty.vassar.edu/lowry/ch3b.html [2] http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient [3] http://www.netnam.vn/unescocourse/statistics/13_6.htm [4] http://web.uccs.edu/lbecker/spss80/ctabs2.htm [5] http://www.nyu.edu/its/socsci/Docs/correlate.html " I guess the point is that we are distinguishing Spearman Correlation from the formula that Spearman came up with. PS: Please cc me, i am not subscribed yet. Sincerely, DG http://gnufans.net/ -- |