From: Ethan A M. <me...@uw...> - 2020-09-11 20:36:17
|
> I'm using it as a way to average over a large number of samples that are > sampled equidistantly (kernel density is way slower obviously). The > documentation doesn't really tell how it is implemented or supposed to > work, so I've experimented a bit to find out. I take the blame for any failures in either the implementation or the documentation of the "bins" option, since I wrote both. I would be happy to amend the code or the description to cover uses that I didn't anticipate. Since I envisioned it as a histogramming tool, it did not occur to me that users would want to plot "with lines" rather than using boxes or impulses. Therefore I did not think about or test the clipping behaviour. As it happens, the routines that draw boxes or impulses do the clipping as they draw each one. The routine that draws "with lines" assumes that the inrange/outrange/undefined status of each point has already been flagged, and only considers clipping if a particular line segment changes state. This is so that successive in range points are drawn as a single poly-line with smooth joins rather than being split into individual segments. Normally the inrange/outrange flag is set on data entry, but those flags apply to the original data points rather than to the binned totals. I have now added a separate pass to re-check the binned data against yrange so that clipping works as you expect it to. (New code added for both 5.4 and 5.5). I remain a little uneasy about the use of negative values in the weighting column, although I don't have a specific example in mind that will fail. Thanks for the explanation of what you are using it for. I agree that in that mode "bins" is similar to a kernel density model that uses a delta function rather than a Gaussian kernel. cheers, Ethan On Friday, 11 September 2020 11:59:28 PDT ASSI wrote: > Ethan A Merritt writes: > > > If there is a second column of data this is interpreted as a weight. > > Let's assume the sample spacing is 1 and no samples are missing, then > you'll get the sum over the bin width. Different sampling density just > scales the result. Dividing by the number of samples gives you the > average. Correct. [snip] > > > Your test script provides no "using" specifier, however, so the plot > > command draws values from column 1 (essentially the numbers 0 to 100) > > and weights each one by the value in the second column (sin(x)). > > It doesn't do that either or the result would be a linearly rising > function with a sin riding on top of it. It reproduces the function if > I arrange the binwidth to contain just a single sample, so clearly the x > column isn't used directly. I may not have phrased that well. It weights the _contribution_ of each sample by the value in the seconds column. If no second column is provided, the weight is 1. |