Thread: [bug] plot bins does not clip output to plot area

A portable, multi-platform, command-line driven graphing utility

Brought to you by: broeker, cgaylord, lhecking, sfeam

gnuplot-beta

[bug] plot bins does not clip output to plot area

From: Achim G. <Str...@ne...> - 2020-09-09 18:05:39

To reproduce:

gnuplot> set table $data
gnuplot> plot [0:100] sin(x)
gnuplot> unset table
gnuplot> plot [*:*][-0.1:0.1] $data w lines bins binwidth=0.1


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

Re: [bug] plot bins does not clip output to plot area

From: Ethan A M. <me...@uw...> - 2020-09-09 22:04:22

On Wednesday, 9 September 2020 11:05:16 PDT Achim Gratz wrote:
> 
> To reproduce:
> 
> gnuplot> set table $data
> gnuplot> plot [0:100] sin(x)
> gnuplot> unset table
> gnuplot> plot [*:*][-0.1:0.1] $data w lines bins binwidth=0.1
> 
> 
> Regards,
> Achim.

The program extends auto-scaled ranges to the next axis tic unless
you tell it not to.  If you do not want the range to be extended, tell it

    set xrange noextend       #version 5.4
or
    set xrange [*:*] noextend #version 5.2

	cheers,

		Ethan

Re: [bug] plot bins does not clip output to plot area

From: ASSI <Str...@ne...> - 2020-09-10 06:24:55

Ethan A Merritt writes:
> The program extends auto-scaled ranges to the next axis tic unless
> you tell it not to.  If you do not want the range to be extended, tell it

The y axis is not autoscaled in my example and there is no clipping at
all, which gets obvious if you do multiplots.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for Waldorf Q V3.00R3 and Q+ V3.54R2:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada

Re: [bug] plot bins does not clip output to plot area

From: Ethan A M. <me...@uw...> - 2020-09-10 16:36:19

On Wednesday, 9 September 2020 22:36:56 PDT ASSI wrote:
> Ethan A Merritt writes:
> > The program extends auto-scaled ranges to the next axis tic unless
> > you tell it not to.  If you do not want the range to be extended, tell it
> 
> The y axis is not autoscaled in my example and there is no clipping at
> all, which gets obvious if you do multiplots.

Ah, sorry.  I misunderstood.

The y clipping seems to fail specifically when plotting "with lines".
It clips properly for "with impulses" or "with boxes", right?

	Ethan

> 
> 
> Regards,
> Achim.
>

Re: [bug] plot bins does not clip output to plot area

From: Achim G. <Str...@ne...> - 2020-09-10 17:13:23

Ethan A Merritt writes:
> The y clipping seems to fail specifically when plotting "with lines".
> It clips properly for "with impulses" or "with boxes", right?

I haven't tried these two specifically, but "with points" also plots
outside the boundary.  OK, impulses and boxes are correctly clipped.
Things aer more clearly visible if you additionally

 set tmargin at screen 0.8
 set bmargin at screen 0.2

in my reproducer before the plot command.

Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for KORG EX-800 and Poly-800MkII V0.9:
http://Synth.Stromeko.net/Downloads.html#KorgSDada

Re: [bug] plot bins does not clip output to plot area

From: Ethan A M. <me...@uw...> - 2020-09-10 20:20:26

I am confused as to what exactly you are trying to plot.
The intent of the "bins" option is similar to the "smooth frequency" option.
It derives an approximation to the distribution of values read from
a single column of data.
If there is a second column of data this is interpreted as a weight.

To show the distribution of values for f(x) over a given range of x,
the plotting command could be

      plot sample [xmin:xmax:increment] '+' using (f(x)) bins {binwidth=foo}

If I rewrite your test case in this form, shown as a histogram, it becomes

      plot sample [0:100:1] '+' using (sin(x)) bins binwidth=0.1 with boxes

The autoscaling works correctly on both x and y, and remains correct if
I plot instead "with lines". Note that this is a single column of data, so
the samples have uniform weights.

Your test case creates this same set of samples but stores it in a
data block with two columns   x and f(x).   To reproduce the plot from
the stored samples the command would be

     plot $data using 2 bins binwidth=0.1

Your test script provides no "using" specifier, however, so the plot
command draws values from column 1 (essentially the numbers 0 to 100)
and weights each one by the value in the second column (sin(x)).
The presence of negative weights in particular confuses the program.
I don't rule out the possibility that there is a legitimate use for such
but the code assumes positive weights, typically either 1 or a set of
fractional values summing to 1 so that the distribution is normalized.
Also since the range of values is [0:100], using a bin width of 0.1 means
that the vast majority of bins are empty.
This produces the strange-looking plot you saw.

If your test case is intended as a simple stand-in for some real world
data that does in fact use negative weights, please clarify the
expected properties of the resulting distribution and I will try to
modify the program's expectations to allow it.  As it stands, the
programs expectations about y values lead it to totally mis-assign
all the points as "inrange" and therefore not clipped.

       Ethan

Re: [bug] plot bins does not clip output to plot area

From: ASSI <Str...@ne...> - 2020-09-11 18:59:59

Ethan A Merritt writes:
> I am confused as to what exactly you are trying to plot.
> The intent of the "bins" option is similar to the "smooth frequency" option.
> It derives an approximation to the distribution of values read from
> a single column of data.

I'm using it as a way to average over a large number of samples that are
sampled equidistantly (kernel density is way slower obviously).  The
documentation doesn't really tell how it is implemented or supposed to
work, so I've experimented a bit to find out.

> If there is a second column of data this is interpreted as a weight.

Let's assume the sample spacing is 1 and no samples are missing, then
you'll get the sum over the bin width.  Different sampling density just
scales the result.  Dividing by the number of samples gives you the
average.

> To show the distribution of values for f(x) over a given range of x, 
> the plotting command could be
>
>       plot sample [xmin:xmax:increment] '+' using (f(x)) bins {binwidth=foo}
>
> If I rewrite your test case in this form, shown as a histogram, it becomes
>
>       plot sample [0:100:1] '+' using (sin(x)) bins binwidth=0.1 with boxes
>
> The autoscaling works correctly on both x and y, and remains correct if 
> I plot instead "with lines". Note that this is a single column of data, so
> the samples have uniform weights.

We've already established that "with boxes" and "with impulses" would
clip the plotted data correctly, but both "with points" and "with lines"
doesn't.  That's the only thing I think is a bug and needs fixing.

> Your test case creates this same set of samples but stores it in a 
> data block with two columns   x and f(x).   To reproduce the plot from
> the stored samples the command would be
>
>      plot $data using 2 bins binwidth=0.1

That's different from what I'm trying to get, though; and it doesn't
reproduce the data either.  It does produce a distribution of the
samples, resolved into 0.1 wide buckets.

> Your test script provides no "using" specifier, however, so the plot
> command draws values from column 1 (essentially the numbers 0 to 100)
> and weights each one by the value in the second column (sin(x)).

It doesn't do that either or the result would be a linearly rising
function with a sin riding on top of it.  It reproduces the function if
I arrange the binwidth to contain just a single sample, so clearly the x
column isn't used directly.  It integrates over the bin if I arrange the
bin to contain a larger number of samples (10 in my example).  So if I
divide the result by the number of samples I get an average over the
bin.

> The presence of negative weights in particular confuses the program.
> I don't rule out the possibility that there is a legitimate use for such
> but the code assumes positive weights, typically either 1 or a set of
> fractional values summing to 1 so that the distribution is normalized.
> Also since the range of values is [0:100], using a bin width of 0.1 means
> that the vast majority of bins are empty.
> This produces the strange-looking plot you saw.

The plot is the result of most of the bins having an integral either
far above 0.1 or below -0.1, so it should have been clipped at the axis
borders, but it wasn't.

> If your test case is intended as a simple stand-in for some real world
> data that does in fact use negative weights, please clarify the
> expected properties of the resulting distribution and I will try to
> modify the program's expectations to allow it.  As it stands, the 
> programs expectations about y values lead it to totally mis-assign
> all the points as "inrange" and therefore not clipped.

Again, I'm not using it this way and the actual code doesn't seem to
work the way you described it either.  Now that you've described how to
get a binned distribution/histogram I do have uses for that too, but that's
not what I was having problems with.

Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#BlofeldUserWavetables

Re: [bug] plot bins does not clip output to plot area

From: Ethan A M. <me...@uw...> - 2020-09-11 20:36:17

> I'm using it as a way to average over a large number of samples that are
> sampled equidistantly (kernel density is way slower obviously). The
> documentation doesn't really tell how it is implemented or supposed to
> work, so I've experimented a bit to find out.

I take the blame for any failures in either the implementation or
the documentation of the "bins" option, since I wrote both.
I would be happy to amend the code or the description to cover
uses that I didn't anticipate.

Since I envisioned it as a histogramming tool, it did not occur to
me that users would want to plot "with lines" rather than using
boxes or impulses.   Therefore I did not think about or test the
clipping behaviour.

As it happens, the routines that draw boxes or impulses do the clipping as
they draw each one.  The routine that draws "with lines" assumes that the
inrange/outrange/undefined status of each point has already been flagged,
and only considers clipping if a particular line segment changes state.
This is so that successive in range points are drawn as a single poly-line
with smooth joins rather than being split into individual segments.
Normally the inrange/outrange flag is set on data entry, but those flags
apply to the original data points rather than to the binned totals.
I have now added a separate pass to re-check the binned data against
yrange so that clipping works as you expect it to.
(New code added for both 5.4 and 5.5).

I remain a little uneasy about the use of negative values in the weighting
column, although I don't have a specific example in mind that will fail.

Thanks for the explanation of what you are using it for.
I agree that in that mode "bins" is similar to a kernel density model
that uses a delta function rather than a Gaussian kernel.

	cheers,

		Ethan

On Friday, 11 September 2020 11:59:28 PDT ASSI wrote:
> Ethan A Merritt writes:
> 
> > If there is a second column of data this is interpreted as a weight.
> 
> Let's assume the sample spacing is 1 and no samples are missing, then
> you'll get the sum over the bin width.  Different sampling density just
> scales the result.  Dividing by the number of samples gives you the
> average.

Correct.

[snip]

> 
> > Your test script provides no "using" specifier, however, so the plot
> > command draws values from column 1 (essentially the numbers 0 to 100)
> > and weights each one by the value in the second column (sin(x)).
> 
> It doesn't do that either or the result would be a linearly rising
> function with a sin riding on top of it.  It reproduces the function if
> I arrange the binwidth to contain just a single sample, so clearly the x
> column isn't used directly.

I may not have phrased that well.  It weights the _contribution_ of each
sample by the value in the seconds column.  If no second column is
provided, the weight is 1.

Re: [bug] plot bins does not clip output to plot area

From: ASSI <Str...@ne...> - 2020-12-17 06:51:13

Ethan A Merritt writes:
> Since I envisioned it as a histogramming tool, it did not occur to
> me that users would want to plot "with lines" rather than using
> boxes or impulses.   Therefore I did not think about or test the
> clipping behaviour.

That fix has rippled through to openSUSE now, the y axis clipping now
works, but the x axis clipping has stopped getting applied…


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

Re: [bug] plot bins does not clip output to plot area

From: Ethan A M. <me...@uw...> - 2020-12-17 19:00:33

On Wednesday, 16 December 2020 22:35:17 PST ASSI wrote:
> Ethan A Merritt writes:
> > Since I envisioned it as a histogramming tool, it did not occur to
> > me that users would want to plot "with lines" rather than using
> > boxes or impulses.   Therefore I did not think about or test the
> > clipping behaviour.
> 
> That fix has rippled through to openSUSE now, the y axis clipping now
> works, but the x axis clipping has stopped getting applied…

  oops

	Ethan

Re: [bug] plot bins does not clip output to plot area

From: Achim G. <Str...@ne...> - 2020-09-11 20:48:18

Ethan A Merritt writes:
>> I'm using it as a way to average over a large number of samples that are
>> sampled equidistantly (kernel density is way slower obviously). The
>> documentation doesn't really tell how it is implemented or supposed to
>> work, so I've experimented a bit to find out.
>
> I take the blame for any failures in either the implementation or
> the documentation of the "bins" option, since I wrote both.
> I would be happy to amend the code or the description to cover
> uses that I didn't anticipate.

No blame to go around… any feature is prone to be (mis)used in ways
never intended, once it's there.  :-)

> Since I envisioned it as a histogramming tool, it did not occur to
> me that users would want to plot "with lines" rather than using
> boxes or impulses.   Therefore I did not think about or test the
> clipping behaviour.

No good deed goes unpunished, as they say.

> As it happens, the routines that draw boxes or impulses do the clipping as
> they draw each one.  The routine that draws "with lines" assumes that the
> inrange/outrange/undefined status of each point has already been flagged,
> and only considers clipping if a particular line segment changes state.
> This is so that successive in range points are drawn as a single poly-line
> with smooth joins rather than being split into individual segments.
> Normally the inrange/outrange flag is set on data entry, but those flags
> apply to the original data points rather than to the binned totals.
> I have now added a separate pass to re-check the binned data against
> yrange so that clipping works as you expect it to.
> (New code added for both 5.4 and 5.5).

Great, I'll check it out later.

> I remain a little uneasy about the use of negative values in the weighting
> column, although I don't have a specific example in mind that will fail.

So far I have not yet encountered any problems with that and I have
likely given this part quite a bit of exercise as most of the data I'm
plotting is roughly centered about zero.

> Thanks for the explanation of what you are using it for.
> I agree that in that mode "bins" is similar to a kernel density model
> that uses a delta function rather than a Gaussian kernel.

I would suggest that it would be welcome if one could specify the kernel
function and/or provide a way to implement general FIR filtering on the
original data.  While the moving average trick is cute, it is also quite
slow and cumbersome once you need to average over a large number of
samples.

> I may not have phrased that well.  It weights the _contribution_ of each
> sample by the value in the seconds column.  If no second column is
> provided, the weight is 1.

That seems a more accessible description to me, thanks.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds