From: Allin C. <cot...@wf...> - 2020-10-15 21:43:24
|
On Thu, 15 Oct 2020, Ethan A Merritt wrote: > On Thursday, 15 October 2020 12:26:22 PDT Allin Cottrell wrote: >> On Thu, 15 Oct 2020, Ethan A Merritt wrote: >> >>> On Thursday, 15 October 2020 08:02:27 PDT Allin Cottrell wrote: >>>> Maybe I'm missing something, but isn't the documentation for >>>> "jitter" backwards with respect to the swarm/square choice? >>>> >>>> The text reads thus: "The default jittering operation displaces >>>> points only along x. This produces a distinctive pattern sometimes >>>> called a "bee swarm plot". The optional keyword square adjusts the y >>>> coordinate of displaced points in addition to their x coordinate so >>>> that the points lie in distinct layers..." >>> >>> The text is correct as written. Perhaps the attached figure, >>> combining two plots from jitter.dem, will clarify. >>> >>> Left panel: >>> original data, randomly distributed on y, all x values the same >>> Center panel: >>> "beeswarm" result from displacing points along x >>> if they would otherwise overlap. >>> Points are still randomly distributed on y. >>> Right panel: >>> "square" plot uses the x displacements that would >>> generate a beeswarm plot, and adds an y displacement >>> that is effectively a floor(y) operation where the unit of >>> the floor operation is the "overlap" parameter to jitter. >>> >>>> >>>> In the demo and in my own usage it seems the default is in fact to >>>> displace in both the x and y dimensions while "square" limits the >>>> scatter to the x dimension. (Except that in some cases I'm not >>>> getting any y displacement with either choice, but that's another >>>> issue.) >> >> Thanks, Ethan. I think I get it now. But this is potentially quite >> confusing -- to understand exactly what jitter is doing one really >> has to look closely at the y data in numerical form. So 'swarm' >> preserves the original y values and just shoves points sideways to >> get them off each other, while 'square' will also regularize the y >> values to get the points into straight rows (more or less). > > Bee swarm plots were new to me. I came across one in a paper I was > reading and made a note to myself to look into it. > The resulting gnuplot implementation was guided by what R does. > > A salient feature of bee swarm plots is that the jitter operation > is reversible. If you consider any single point you can reconstruct > its original [x,y] coordinates by projecting back onto the corresponding > discrete x value. > > The "square" option loses this. It is essentially a representation > of binned data with a small number of points in each bin. There is > nothing to distinguish two points in the same bin from each other, > as their original coordinates have been lost. > > As the number of points becomes large, both options are inferior > to a violin plot. The violinplot demo compares them and also > shows a Gaussian jitter. > >> And then, given a pile-up of data points at some discrete {x,y} >> value, there's no option to nudge them apart in both dimensions to >> form a cloud? > > This sounds nice, but if x and y are both continuous I don't think > it is a well-defined operation. For a small-ish number of points you > could define an energy function that has of a steep penalty gradient > for overlap and then minimize the total energy by monte carlo. > That rapidly becomes compute-intensive as the number of points > increases. > > For x and y discrete there's a better option, sometimes called a > bubble plot. At each discrete [x,y], draw a circle with size > proportional to the number of points that piled up there. > The "size" is a sore point, however. radius or area? > I have not given much thought to whether that can be done with > existing options in gnuplot or whether it would require new code > or an external data processing stage. Thanks again. All very clear. It seems that what I was after in the doubly discrete case is really a bubble plot (and I'd vote for area!). -- Allin Cottrell Department of Economics Wake Forest University |