Re: jitter documentation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thursday, 15 October 2020 12:26:22 PDT Allin Cottrell wrote:
> On Thu, 15 Oct 2020, Ethan A Merritt wrote:
> 
> > On Thursday, 15 October 2020 08:02:27 PDT Allin Cottrell wrote:
> >> Maybe I'm missing something, but isn't the documentation for
> >> "jitter" backwards with respect to the swarm/square choice?
> >>
> >> The text reads thus: "The default jittering operation displaces
> >> points only along x. This produces a distinctive pattern sometimes
> >> called a "bee swarm plot". The optional keyword square adjusts the y
> >> coordinate of displaced points in addition to their x coordinate so
> >> that the points lie in distinct layers..."
> >
> > The text is correct as written.  Perhaps the attached figure,
> > combining two plots from jitter.dem, will clarify.
> >
> > Left panel:
> > 	original data, randomly distributed on y, all x values the same
> > Center panel:
> > 	"beeswarm" result from displacing points along x
> > 	if they would otherwise overlap.
> > 	Points are still randomly distributed on y.
> > Right panel:
> > 	"square" plot uses the x displacements that would
> > 	generate a beeswarm plot, and adds an y displacement
> > 	that is effectively a floor(y) operation where the unit of
> > 	the floor operation is the "overlap" parameter to jitter.
> >
> >>
> >> In the demo and in my own usage it seems the default is in fact to
> >> displace in both the x and y dimensions while "square" limits the
> >> scatter to the x dimension. (Except that in some cases I'm not
> >> getting any y displacement with either choice, but that's another
> >> issue.)
> 
> Thanks, Ethan. I think I get it now. But this is potentially quite 
> confusing -- to understand exactly what jitter is doing one really 
> has to look closely at the y data in numerical form. So 'swarm' 
> preserves the original y values and just shoves points sideways to 
> get them off each other, while 'square' will also regularize the y 
> values to get the points into straight rows (more or less).

Bee swarm plots were new to me.  I came across one in a paper I was
reading and made a note to myself to look into it.
The resulting gnuplot implementation was guided by what R does.

A salient feature of bee swarm plots is that the jitter operation
is reversible. If you consider any single point you can reconstruct
its original [x,y] coordinates by projecting back onto the corresponding
discrete x value.

The "square" option loses this.  It is essentially a representation
of binned data with a small number of points in each bin.  There is
nothing to distinguish two points in the same bin from each other,
as their original coordinates have been lost.

As the number of points becomes large, both options are inferior
to a violin plot.  The violinplot demo compares them and also
shows a Gaussian jitter.

> And then, given a pile-up of data points at some discrete {x,y}
> value, there's no option to nudge them apart in both dimensions to 
> form a cloud? 

This sounds nice, but if x and y are both continuous I don't think
it is a well-defined operation.  For a small-ish number of points you
could define an energy function that has of a steep penalty gradient
for overlap and then minimize the total energy by monte carlo.
That rapidly becomes compute-intensive as the number of points
increases.

For x and y discrete there's a better option, sometimes called a
bubble plot.  At each discrete [x,y], draw a circle with size
proportional to the number of points that piled up there.
The "size" is a sore point, however.  radius or area?
I have not given much thought to whether that can be done with
existing options in gnuplot or whether it would require new code
or an external data processing stage.

	cheers,
		Ethan

Re: jitter documentation

A portable, multi-platform, command-line driven graphing utility

Re: jitter documentation