On Thursday 19 July 2007 18:40, James R. Van Zandt wrote:
> I've just uploaded a patch that implements probability axes:
> 1757226 Provide "probability" scaling and axes 2007-07-19 21:31 5 nobody vanzandt
But I'm heading out of town for a week, and won't be able to look at it
until I get back.
> Probability axes simplify the presentation of certain kinds of data.
> Recall that if Y varies as the exponential of X, then data values are
> best shown in a log plot - i.e. where you plot the logarithm of Y
> against X. More generally, if there are very small and very large Y
> values, but all are positive, then it is convenient to use a log Y
> axis. E.g. if you plot Y values of .002, .02, and 200 along a linear
> axis, it would be hard to tell the first two apart. However, it would
> be easy with a log axis.
> Suppose you have a collection of n data values that you think are
> normally distributed. Find the "empirical distribution function": a
> cumulative probability distribution function that concentrates
> probability 1/n at each of the n numbers in the sample. I.e. sort the
> values and label them X(1) through X(n). Then let Y(k)=k/n. If you
> plot Y vs. X, you get something like a sigmoid curve, which starts
> near the line Y=0, rises quickly near the mean of the X values, then
> gradually approaches the line Y=1. Probability axes are the way of
> transforming Y values such that, if the X values were really normally
> distributed, then the transformed data points fall approximately along
> a straight line.
> More generally, if there are both very small Y values and values very
> near 1, but none outside the range 0 to 1 (e.g. if Ys represent
> probabilities), then probability axes can make the data easier to see.
> Summary of changes: Most of the added code is in the new files
> "transform.c" and the corresponding header file "transform.h".
> The "probability" transformation (inverse of normal CDF) is the only
> one implemented here. However, the logic for labeling the axis is
> quite general. The goal is to select a set of tics and minitics such
> - tic labels are "simple"
> - tic labels do not overlap
> - the distances between tics are roughly equal
> - the numeric value corresponding to any minitic is unambiguous
> - the distances between minitics are roughly equal, and small enough
> that the numeric value of any point can be estimated by eye.
> The current code for log axes uses the functional forms for both the
> data transform log(x) and its inverse 10^x. This patch for
> probability axes does the same. However the part that calculates tic
> and minitic locations uses the function form only for the forward
> transformation -- inverse values are found numerically. My hope is
> that this more general implementation can be easily applied to other
> transform functions, including user-supplied ones without a convenient
> axis_array[axis].log is now an enum, with values SCALE_LINEAR,
> SCALE_LOG, and SCALE_PROBABILITY, and AXIS_LOG_VALUE and
> AXIS_DE_LOG_VALUE are functions instead of macros.
> - Rename axis_array[axis].log to axis_array[axis].scale
> - Convert names AXIS_LOG_VALUE and AXIS_DE_LOG_VALUE to lower case.
> - Implement several other kinds of axes, including "weibull plots".
> - Implement a user-supplied scaling function.
> - Implement scaling only of axis labels, without scaling the data.
> - New command syntax [axis] "set scale linear|log|probability|f(x) x|y|..."
> Ethan Merritt wrote:
> > > >
> > > > My request was simply that you leave the existing log scale code
> > > > in place until a general mechanism was ready to replace it.
> > >
> > > What do you mean by general? Able to apply mappings other than
> > > log scale?
> > Exactly. I gave a bunch of examples earlier. The most common
> > request (and hardest to work around) is marking off the X-axis as
> > 1/x. For instance, we crystallographers have to deal with
> > inconsistent usage all the time; some quantities are given in terms
> > of energy while others are given in terms of wavelength. There is
> > an inverse relationship:
> > Energy = k / Wavelength
> > where k is an appropriate constant. It is a pain to try to persuade
> > gnuplot to mark off both Energy and Wavelength on axes of the same plot.
> > I would like to be able to do something like:
> > set axis x1 x # will use for Energy
> > set axis x2 k/x # will use for Wavelength
> > set axis y1 log(y) # log scale value to be plotted
> > Even better if there is a way to automatically lock x1 to x2, so that
> > they are forced to span the same range.
> > --
> > Ethan A Merritt
> Ethan wants to rescale one or both axes without changing the data. I
> would like to think that this patch is a first step toward that
> capability. The patch rescales the data as well as the axes.
> At least, it could easily handle the tic/minitic placement.
> (By the way, I do have CVS write access, but I figured this would be
> too big a change to apply to the head. I am not comfortable enough
> with my CVS skills to start a new branch, let alone to try and merge
> later on.)
> - Jim Van Zandt
Ethan A Merritt