The smooth frequency option can be used to generate histograms from data, but the
binning function, the user has to provide, is rather complex. The attached patch introduces a smooth histogram function, which is much easier to use. The user can provide the desired binning via the "set xrange [xmin:max]" and the "set boxwidth xb" command. This then generates bins of width xb from the interval [xmin:xmin+xb] to [xmax-xb:xb] and sums the y-values of the data into these intervals (x-value is set to bin center). This approach also has the advantage that bins with no entries are plotted with y=0.
Comments:
It would be nicer if this worked with autoscaling.
Having said that, it would also be nice to have some explicit option[s] for dealing with out-of-range data points. Put them in the last bin? Ignore them? Make an extra bin for [xmax : Inf] ? Plot them as impulses separately from the histogram boxes (analogous to outliers in boxplot mode)?
By analogy with the "smooth kdensity" option, I am inclined to think that rather than taking the bin width from "set boxwidth" it would make sense to put the bin control in the plot command itself:
That would allow you to have several plots with different bin widths.
I don't like the code organization of making "smooth histogram" a special code section at the top of the main data input loop. Wouldn't it be more natural to use the same 2-pass approach of other smoothing options? I.e., read in the data straight while tracking min/max for autoscaling, then call a smooth_histogram routine to reorganize the data internally before handing it off to plot_bars() or plot_impulses() or whatever.
The code should handle polar data also. That means the code cannot assume that the bin range is on X and thus using xrange[xmin:xmax], since it might be theta instead and we don't even have an axis range for the polar angle theta.
Last edit: Ethan Merritt 2014-07-03
Implemented as option "bins" == "smooth bins" in 5.1