|
From: Allin C. <cot...@wf...> - 2020-10-15 21:43:24
|
On Thu, 15 Oct 2020, Ethan A Merritt wrote:
> On Thursday, 15 October 2020 12:26:22 PDT Allin Cottrell wrote:
>> On Thu, 15 Oct 2020, Ethan A Merritt wrote:
>>
>>> On Thursday, 15 October 2020 08:02:27 PDT Allin Cottrell wrote:
>>>> Maybe I'm missing something, but isn't the documentation for
>>>> "jitter" backwards with respect to the swarm/square choice?
>>>>
>>>> The text reads thus: "The default jittering operation displaces
>>>> points only along x. This produces a distinctive pattern sometimes
>>>> called a "bee swarm plot". The optional keyword square adjusts the y
>>>> coordinate of displaced points in addition to their x coordinate so
>>>> that the points lie in distinct layers..."
>>>
>>> The text is correct as written. Perhaps the attached figure,
>>> combining two plots from jitter.dem, will clarify.
>>>
>>> Left panel:
>>> original data, randomly distributed on y, all x values the same
>>> Center panel:
>>> "beeswarm" result from displacing points along x
>>> if they would otherwise overlap.
>>> Points are still randomly distributed on y.
>>> Right panel:
>>> "square" plot uses the x displacements that would
>>> generate a beeswarm plot, and adds an y displacement
>>> that is effectively a floor(y) operation where the unit of
>>> the floor operation is the "overlap" parameter to jitter.
>>>
>>>>
>>>> In the demo and in my own usage it seems the default is in fact to
>>>> displace in both the x and y dimensions while "square" limits the
>>>> scatter to the x dimension. (Except that in some cases I'm not
>>>> getting any y displacement with either choice, but that's another
>>>> issue.)
>>
>> Thanks, Ethan. I think I get it now. But this is potentially quite
>> confusing -- to understand exactly what jitter is doing one really
>> has to look closely at the y data in numerical form. So 'swarm'
>> preserves the original y values and just shoves points sideways to
>> get them off each other, while 'square' will also regularize the y
>> values to get the points into straight rows (more or less).
>
> Bee swarm plots were new to me. I came across one in a paper I was
> reading and made a note to myself to look into it.
> The resulting gnuplot implementation was guided by what R does.
>
> A salient feature of bee swarm plots is that the jitter operation
> is reversible. If you consider any single point you can reconstruct
> its original [x,y] coordinates by projecting back onto the corresponding
> discrete x value.
>
> The "square" option loses this. It is essentially a representation
> of binned data with a small number of points in each bin. There is
> nothing to distinguish two points in the same bin from each other,
> as their original coordinates have been lost.
>
> As the number of points becomes large, both options are inferior
> to a violin plot. The violinplot demo compares them and also
> shows a Gaussian jitter.
>
>> And then, given a pile-up of data points at some discrete {x,y}
>> value, there's no option to nudge them apart in both dimensions to
>> form a cloud?
>
> This sounds nice, but if x and y are both continuous I don't think
> it is a well-defined operation. For a small-ish number of points you
> could define an energy function that has of a steep penalty gradient
> for overlap and then minimize the total energy by monte carlo.
> That rapidly becomes compute-intensive as the number of points
> increases.
>
> For x and y discrete there's a better option, sometimes called a
> bubble plot. At each discrete [x,y], draw a circle with size
> proportional to the number of points that piled up there.
> The "size" is a sore point, however. radius or area?
> I have not given much thought to whether that can be done with
> existing options in gnuplot or whether it would require new code
> or an external data processing stage.
Thanks again. All very clear.
It seems that what I was after in the doubly discrete case is really
a bubble plot (and I'd vote for area!).
--
Allin Cottrell
Department of Economics
Wake Forest University
|