Menu

#695 Add smooth histogram option to plot command

Version 6
closed-accepted
nobody
None
5
2025-07-21
2014-06-20
No

The smooth frequency option can be used to generate histograms from data, but the
binning function, the user has to provide, is rather complex. The attached patch introduces a smooth histogram function, which is much easier to use. The user can provide the desired binning via the "set xrange [xmin:max]" and the "set boxwidth xb" command. This then generates bins of width xb from the interval [xmin:xmin+xb] to [xmax-xb:xb] and sums the y-values of the data into these intervals (x-value is set to bin center). This approach also has the advantage that bins with no entries are plotted with y=0.

1 Attachments

Discussion

  • Ethan Merritt

    Ethan Merritt - 2014-07-03

    Comments:

    • It would be nicer if this worked with autoscaling.

    • Having said that, it would also be nice to have some explicit option[s] for dealing with out-of-range data points. Put them in the last bin? Ignore them? Make an extra bin for [xmax : Inf] ? Plot them as impulses separately from the histogram boxes (analogous to outliers in boxplot mode)?

    • By analogy with the "smooth kdensity" option, I am inclined to think that rather than taking the bin width from "set boxwidth" it would make sense to put the bin control in the plot command itself:

          plot ... smooth histogram {binwidth <width\>  |  nbins <N\>}
    

    That would allow you to have several plots with different bin widths.

    • I don't like the code organization of making "smooth histogram" a special code section at the top of the main data input loop. Wouldn't it be more natural to use the same 2-pass approach of other smoothing options? I.e., read in the data straight while tracking min/max for autoscaling, then call a smooth_histogram routine to reorganize the data internally before handing it off to plot_bars() or plot_impulses() or whatever.

    • The code should handle polar data also. That means the code cannot assume that the bin range is on X and thus using xrange[xmin:xmax], since it might be theta instead and we don't even have an axis range for the polar angle theta.

     

    Last edit: Ethan Merritt 2014-07-03
  • Ethan Merritt

    Ethan Merritt - 2017-03-24
    • status: open --> closed-accepted
     
  • Ethan Merritt

    Ethan Merritt - 2017-03-24

    Implemented as option "bins" == "smooth bins" in 5.1

     

Log in to post a comment.

MongoDB Logo MongoDB