|
From: ASSI <Str...@ne...> - 2020-09-11 18:59:59
|
Ethan A Merritt writes:
> I am confused as to what exactly you are trying to plot.
> The intent of the "bins" option is similar to the "smooth frequency" option.
> It derives an approximation to the distribution of values read from
> a single column of data.
I'm using it as a way to average over a large number of samples that are
sampled equidistantly (kernel density is way slower obviously). The
documentation doesn't really tell how it is implemented or supposed to
work, so I've experimented a bit to find out.
> If there is a second column of data this is interpreted as a weight.
Let's assume the sample spacing is 1 and no samples are missing, then
you'll get the sum over the bin width. Different sampling density just
scales the result. Dividing by the number of samples gives you the
average.
> To show the distribution of values for f(x) over a given range of x,
> the plotting command could be
>
> plot sample [xmin:xmax:increment] '+' using (f(x)) bins {binwidth=foo}
>
> If I rewrite your test case in this form, shown as a histogram, it becomes
>
> plot sample [0:100:1] '+' using (sin(x)) bins binwidth=0.1 with boxes
>
> The autoscaling works correctly on both x and y, and remains correct if
> I plot instead "with lines". Note that this is a single column of data, so
> the samples have uniform weights.
We've already established that "with boxes" and "with impulses" would
clip the plotted data correctly, but both "with points" and "with lines"
doesn't. That's the only thing I think is a bug and needs fixing.
> Your test case creates this same set of samples but stores it in a
> data block with two columns x and f(x). To reproduce the plot from
> the stored samples the command would be
>
> plot $data using 2 bins binwidth=0.1
That's different from what I'm trying to get, though; and it doesn't
reproduce the data either. It does produce a distribution of the
samples, resolved into 0.1 wide buckets.
> Your test script provides no "using" specifier, however, so the plot
> command draws values from column 1 (essentially the numbers 0 to 100)
> and weights each one by the value in the second column (sin(x)).
It doesn't do that either or the result would be a linearly rising
function with a sin riding on top of it. It reproduces the function if
I arrange the binwidth to contain just a single sample, so clearly the x
column isn't used directly. It integrates over the bin if I arrange the
bin to contain a larger number of samples (10 in my example). So if I
divide the result by the number of samples I get an average over the
bin.
> The presence of negative weights in particular confuses the program.
> I don't rule out the possibility that there is a legitimate use for such
> but the code assumes positive weights, typically either 1 or a set of
> fractional values summing to 1 so that the distribution is normalized.
> Also since the range of values is [0:100], using a bin width of 0.1 means
> that the vast majority of bins are empty.
> This produces the strange-looking plot you saw.
The plot is the result of most of the bins having an integral either
far above 0.1 or below -0.1, so it should have been clipped at the axis
borders, but it wasn't.
> If your test case is intended as a simple stand-in for some real world
> data that does in fact use negative weights, please clarify the
> expected properties of the resulting distribution and I will try to
> modify the program's expectations to allow it. As it stands, the
> programs expectations about y values lead it to totally mis-assign
> all the points as "inrange" and therefore not clipped.
Again, I'm not using it this way and the actual code doesn't seem to
work the way you described it either. Now that you've described how to
get a binned distribution/histogram I do have uses for that too, but that's
not what I was having problems with.
Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+
Wavetables for the Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#BlofeldUserWavetables
|