From: ASSI <Str...@ne...> - 2020-09-11 18:59:59
|
Ethan A Merritt writes: > I am confused as to what exactly you are trying to plot. > The intent of the "bins" option is similar to the "smooth frequency" option. > It derives an approximation to the distribution of values read from > a single column of data. I'm using it as a way to average over a large number of samples that are sampled equidistantly (kernel density is way slower obviously). The documentation doesn't really tell how it is implemented or supposed to work, so I've experimented a bit to find out. > If there is a second column of data this is interpreted as a weight. Let's assume the sample spacing is 1 and no samples are missing, then you'll get the sum over the bin width. Different sampling density just scales the result. Dividing by the number of samples gives you the average. > To show the distribution of values for f(x) over a given range of x, > the plotting command could be > > plot sample [xmin:xmax:increment] '+' using (f(x)) bins {binwidth=foo} > > If I rewrite your test case in this form, shown as a histogram, it becomes > > plot sample [0:100:1] '+' using (sin(x)) bins binwidth=0.1 with boxes > > The autoscaling works correctly on both x and y, and remains correct if > I plot instead "with lines". Note that this is a single column of data, so > the samples have uniform weights. We've already established that "with boxes" and "with impulses" would clip the plotted data correctly, but both "with points" and "with lines" doesn't. That's the only thing I think is a bug and needs fixing. > Your test case creates this same set of samples but stores it in a > data block with two columns x and f(x). To reproduce the plot from > the stored samples the command would be > > plot $data using 2 bins binwidth=0.1 That's different from what I'm trying to get, though; and it doesn't reproduce the data either. It does produce a distribution of the samples, resolved into 0.1 wide buckets. > Your test script provides no "using" specifier, however, so the plot > command draws values from column 1 (essentially the numbers 0 to 100) > and weights each one by the value in the second column (sin(x)). It doesn't do that either or the result would be a linearly rising function with a sin riding on top of it. It reproduces the function if I arrange the binwidth to contain just a single sample, so clearly the x column isn't used directly. It integrates over the bin if I arrange the bin to contain a larger number of samples (10 in my example). So if I divide the result by the number of samples I get an average over the bin. > The presence of negative weights in particular confuses the program. > I don't rule out the possibility that there is a legitimate use for such > but the code assumes positive weights, typically either 1 or a set of > fractional values summing to 1 so that the distribution is normalized. > Also since the range of values is [0:100], using a bin width of 0.1 means > that the vast majority of bins are empty. > This produces the strange-looking plot you saw. The plot is the result of most of the bins having an integral either far above 0.1 or below -0.1, so it should have been clipped at the axis borders, but it wasn't. > If your test case is intended as a simple stand-in for some real world > data that does in fact use negative weights, please clarify the > expected properties of the resulting distribution and I will try to > modify the program's expectations to allow it. As it stands, the > programs expectations about y values lead it to totally mis-assign > all the points as "inrange" and therefore not clipped. Again, I'm not using it this way and the actual code doesn't seem to work the way you described it either. Now that you've described how to get a binned distribution/histogram I do have uses for that too, but that's not what I was having problems with. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Wavetables for the Waldorf Blofeld: http://Synth.Stromeko.net/Downloads.html#BlofeldUserWavetables |