Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#1118 incorrect kernel density calculation

5.0
closed-fixed
2D plot (258)
7
2013-09-24
2012-04-19
Thomas Gaillard
No

Hello,

I have found a bug in gnuplot kernel density calculation. In the
eval_kdensity function of interpol.c, there is a definition of "n =
num_points - 1" and then the data average is calculated as sum/n instead
of sum/num_points. The data average, standard deviation, and default
bandwidth are thus incorrectly calculated. The standard deviation can
even become undefined causing the plot to fail. This applies to the
4.6.0 and earlier versions. A patch for 4.6.0 is attached, as well as
the datasets used for testing.

I have tested this with two datasets (see below), one causing a wrong
plot, the other failing to plot.

The command used is:

plot "dataset" u 1:(1./100) smooth kdensity w l

Without my patch:

DATASET1:
num_points: 100
n: 99
min: 23.23000
max: 48.23000
avg: 35.88505 INCORRECT!
sigma: 3.95375 INCORRECT!
default_bandwidth: 1.66723 INCORRECT!

For dataset1, average, standard deviation, and default bandwidth are
incorrectly calculated but a (wrong) plot is obtained.

DATASET2:
num_points: 100
n: 99
min: 21.03000
max: 25.67000
avg: 23.33657 INCORRECT!
sigma: -nan
default_bandwidth: -nan
Warning: empty y range [0.01:0.01], adjusting to [0.0099:0.0101]

For dataset2 it is worse: as the incorrectly calculated standard
deviation is undefined (sqrt of a negative number), the default
bandwidth is also undefined and there is no visible plot obtained.

With my patch, the correct behavior is obtained:

DATASET1:
num_points: 100
n: 99
min: 23.23000
max: 48.23000
avg: 35.52620
sigma: 5.31266
default_bandwidth: 2.24027

DATASET2:
num_points: 100
n: 99
min: 21.03000
max: 25.67000
avg: 23.10320
sigma: 0.99933
default_bandwidth: 0.42140

Thomas Gaillard

Related

Patches: #656

Discussion

  • interpol.c patch for 4.6.0

     
  • dataset1

     
    Attachments
  • dataset2

     
    Attachments
    • priority: 5 --> 7
     
  • Ethan Merritt
    Ethan Merritt
    2012-04-19

    • assigned_to: nobody --> janert
     
    • status: open --> open-fixed
     
  • Ethan Merritt
    Ethan Merritt
    2012-08-23

    I have applied your patch to both 4.6 and 4.7

    The calculation of the mean was clearly wrong.

    However, I'd like someone to confirm whether the kernel density function wants sigma or estimated standard deviation. Given that gnuplot is usually given experimental samples, the original division by (N-1) may be correct.

    Philipp?

     
  • Ethan Merritt
    Ethan Merritt
    2013-09-24

    • status: open-fixed --> closed-fixed
    • Group: --> 5.0