I have found a bug in gnuplot kernel density calculation. In the
eval_kdensity function of interpol.c, there is a definition of "n =
num_points - 1" and then the data average is calculated as sum/n instead
of sum/num_points. The data average, standard deviation, and default
bandwidth are thus incorrectly calculated. The standard deviation can
even become undefined causing the plot to fail. This applies to the
4.6.0 and earlier versions. A patch for 4.6.0 is attached, as well as
the datasets used for testing.
I have tested this with two datasets (see below), one causing a wrong
plot, the other failing to plot.
The command used is:
plot "dataset" u 1:(1./100) smooth kdensity w l
Without my patch:
avg: 35.88505 INCORRECT!
sigma: 3.95375 INCORRECT!
default_bandwidth: 1.66723 INCORRECT!
For dataset1, average, standard deviation, and default bandwidth are
incorrectly calculated but a (wrong) plot is obtained.
avg: 23.33657 INCORRECT!
Warning: empty y range [0.01:0.01], adjusting to [0.0099:0.0101]
For dataset2 it is worse: as the incorrectly calculated standard
deviation is undefined (sqrt of a negative number), the default
bandwidth is also undefined and there is no visible plot obtained.
With my patch, the correct behavior is obtained: