Re: jitter documentation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, 15 Oct 2020, Ethan A Merritt wrote:

> On Thursday, 15 October 2020 12:26:22 PDT Allin Cottrell wrote:
>> On Thu, 15 Oct 2020, Ethan A Merritt wrote:
>>
>>> On Thursday, 15 October 2020 08:02:27 PDT Allin Cottrell wrote:
>>>> Maybe I'm missing something, but isn't the documentation for
>>>> "jitter" backwards with respect to the swarm/square choice?
>>>>
>>>> The text reads thus: "The default jittering operation displaces
>>>> points only along x. This produces a distinctive pattern sometimes
>>>> called a "bee swarm plot". The optional keyword square adjusts the y
>>>> coordinate of displaced points in addition to their x coordinate so
>>>> that the points lie in distinct layers..."
>>>
>>> The text is correct as written.  Perhaps the attached figure,
>>> combining two plots from jitter.dem, will clarify.
>>>
>>> Left panel:
>>> 	original data, randomly distributed on y, all x values the same
>>> Center panel:
>>> 	"beeswarm" result from displacing points along x
>>> 	if they would otherwise overlap.
>>> 	Points are still randomly distributed on y.
>>> Right panel:
>>> 	"square" plot uses the x displacements that would
>>> 	generate a beeswarm plot, and adds an y displacement
>>> 	that is effectively a floor(y) operation where the unit of
>>> 	the floor operation is the "overlap" parameter to jitter.
>>>
>>>>
>>>> In the demo and in my own usage it seems the default is in fact to
>>>> displace in both the x and y dimensions while "square" limits the
>>>> scatter to the x dimension. (Except that in some cases I'm not
>>>> getting any y displacement with either choice, but that's another
>>>> issue.)
>>
>> Thanks, Ethan. I think I get it now. But this is potentially quite
>> confusing -- to understand exactly what jitter is doing one really
>> has to look closely at the y data in numerical form. So 'swarm'
>> preserves the original y values and just shoves points sideways to
>> get them off each other, while 'square' will also regularize the y
>> values to get the points into straight rows (more or less).
>
> Bee swarm plots were new to me.  I came across one in a paper I was
> reading and made a note to myself to look into it.
> The resulting gnuplot implementation was guided by what R does.
>
> A salient feature of bee swarm plots is that the jitter operation
> is reversible. If you consider any single point you can reconstruct
> its original [x,y] coordinates by projecting back onto the corresponding
> discrete x value.
>
> The "square" option loses this.  It is essentially a representation
> of binned data with a small number of points in each bin.  There is
> nothing to distinguish two points in the same bin from each other,
> as their original coordinates have been lost.
>
> As the number of points becomes large, both options are inferior
> to a violin plot.  The violinplot demo compares them and also
> shows a Gaussian jitter.
>
>> And then, given a pile-up of data points at some discrete {x,y}
>> value, there's no option to nudge them apart in both dimensions to
>> form a cloud?
>
> This sounds nice, but if x and y are both continuous I don't think
> it is a well-defined operation.  For a small-ish number of points you
> could define an energy function that has of a steep penalty gradient
> for overlap and then minimize the total energy by monte carlo.
> That rapidly becomes compute-intensive as the number of points
> increases.
>
> For x and y discrete there's a better option, sometimes called a
> bubble plot.  At each discrete [x,y], draw a circle with size
> proportional to the number of points that piled up there.
> The "size" is a sore point, however.  radius or area?
> I have not given much thought to whether that can be done with
> existing options in gnuplot or whether it would require new code
> or an external data processing stage.

Thanks again. All very clear.

It seems that what I was after in the doubly discrete case is really 
a bubble plot (and I'd vote for area!).

--
Allin Cottrell
Department of Economics
Wake Forest University

Re: jitter documentation

A portable, multi-platform, command-line driven graphing utility

Re: jitter documentation