Re: [Geotools-devel] Quantile classification oddities

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hey all,

        Wherein we discover that stats are hard, even for the simple
        questions...

On Tue, 2008-05-20 at 10:18 +0200, Andrea Aime wrote:
> Jody Garnett ha scritto:
> > What a difficult question; is there a strict definition of the quantile 
> > function we could grab from statistics or something?

I'm not sure the use of "Quantile" for this function is correct
terminology but don't have time to explore it rigourously. So far all
I've learned is that I've now forgotten how to use R.

As ever, wikipedia is our friend these days:
        By a quantile, we mean the fraction (or percent) of points below
        the given value. That is, the 0.3 (or 30%) quantile is the point
        at which 30% percent of the data fall below and 70% fall above
        that value.
Since the key footnote points us to R, we can start to trust this as an
authoritative source.

http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html

In R, it seems you want a type=3 method of quantification 
  " Type 3 SAS definition: nearest even order statistic"
but, again, I don't have the time to answer this rigourously today.

> Quantile(  {-1 -2 0 0 0 0 3 5 7 9}, 2) ==> ?
> Quantile(  {-1 -2 0 0 0 0 3 5 7 9}, 3) ==> ?

eratosthenes:~> R
...

> x <- c(-1,-2,0,0,0,0,3,5,7,9)
> n <- 2
> quantile(x,probs=seq(0,1,1/n))
  0%  50% 100% 
  -2    0    9 
> n <-3
> quantile(x,probs=seq(0,1,1/n))
       0% 33.33333% 66.66667%      100% 
       -2         0         3         9 

with the value shown being the rightmost in the original vector and
defining the breaks which can be applied to the vector to yield the
resulting classes. (You don't care about the leftmost value).

> Quantile(  {-10 -9 -2 0 0 0 1 2 4 9 9 9}, 3) ==> what now?

> x2 <- c(-10,-9,-2,0,0,0,1,2,4,9,9,9)
> n <- 3
> quantile(x2,probs=seq(0,1,1/n))
        0%  33.33333%  66.66667%       100% 
-10.000000   0.000000   2.666667   9.000000 
> quantile(x2,probs=seq(0,1,1/n),type=3)
       0% 33.33333% 66.66667%      100% 
      -10         0         2         9 

Also you might look at the spreadsheet functions definitions since they
might explain the terminology needed.

--adrian

Re: [Geotools-devel] Quantile classification oddities

Toolkit for working with and mapping geospatial data

Re: [Geotools-devel] Quantile classification oddities