#14 new 'smooth freq' option of plot command

closed-accepted
None
5
2001-08-09
2001-07-03
No

The attached patch ref. 3.7.1 adds a new option -- freq
-- to
the smooth option of the plot command. Similar to
'smooth unique', 'smooth freq' makes the data
monotonic. Where 'unique' plots the arithmetic mean of
the y values for each occuring x value, 'freq' plots
their sums.

This allows you to plot the frequency of occurences of
each x value, for example:

plot '/tmp/data' using 1:(1) smooth freq

If you want/have weighted frequencies where the weights
are specified in a second variable, you could say:

plot '/tmp/data' using 1:2 smooth freq

PROBLEM/Please Help: One problem with this patch is
that, because the y values are changing -- perhaps
significantly -- from what was read in, they no longer
fall within an automatically detected y range. If the y
range is set explicitly, they plot just fine. However,
with automatic y range the points are not rendered
correctly, and I'm not sure about the best way to
adjust the automatic y range(s) once the new y values
are calculated.

If someone understands that part and would please take
a look at the patched pc_implode() routine and suggest
an approach that's in keeping with the gnuplot
philosophy, I would very much appreciate it.
--
Todd Lewis
utoddl@email.unc.edu

Discussion

  • Todd M. Lewis

    Todd M. Lewis - 2001-07-06

    Logged In: YES
    user_id=29839

    Upon further study of the code in interpol.c, I have fixed
    the problem with automatic ranges. "smooth freq" is now
    working, with the gnuplot-freq1.patch applied to 3.7.1. The
    patch includes updates to the documentation also.

     
  • Todd M. Lewis

    Todd M. Lewis - 2001-07-06

    Logged In: YES
    user_id=29839

    Hmmm. I tried to delete the broken 'gnuplot-freq.patch' from
    2001-07-03 19:50, but it wouldn't let me. Please use
    'gnuplot-freq1.patch' from 2001-07-06 instead. --
    Todd_Lewis@unc.edu

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    (I've deleted the earlier version of the patch for you ---
    file deletion requires admin privilege).

    Looking at the docs, there are some points that need
    clarification:

    1) "the same x value": you should describe what this means.
    For floating point data, exact equality is not a useful
    concept, and gnuplot doesn't use it, in most cases.

    2) "connected by straight lines" --- this mixes up the plot
    style with the smoothing option. There's nothing to stop
    users from plotting "smooth freq with points". A
    combintaiton like "smooth freq with histeps" would be
    even more obvious --- this would be the most obvious
    use of "smooth freq" at all.

     
  • Hans-Bernhard Broeker

    • assigned_to: nobody --> broeker
     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    One further idea: I think a more appropriate name for this
    option would be "histogram" --- it's for creating histograms
    out of measurement series, after all.

     
  • Todd M. Lewis

    Todd M. Lewis - 2001-07-09

    Logged In: YES
    user_id=29839

    Thanks for deleting the earlier patch.

    As for the docs, the wording (which I agree is not all that
    great) is a direct copy of the comments for "smooth unique".
    The problems you mentioned need to be corrected there as
    well -- it's the same wording.

    Re: "freq" vs. "histogram", it definitely is not a
    histogram, it's a frequency distribution. To quote my
    dictionary:
    <blockquote><b>Histogram</b> n. Statistics. a graph of a
    frequency distribution in which equal intervals of values
    are marked on a horizontal axis and the frequency
    corresponding to each interval is indicated by the height of
    a rectangle having the interval as its base.</blockquote>

    In other words, calling it "histogram" would imply "smooth
    freq with boxes" (or something equally flawed). The
    statistic being plotted really is called "frequency". A
    histogram is one of several ways gnuplot can display the
    frequency statistic. Perhaps using "frequency" instead of
    just "freq" would be better, but IMO "histogram" is not the
    correct term.

    I'm tempted to add a "cumulative frequency" as well -- but
    we're quickly drifting from plotting data points to
    statistical analysis, and that's a slippery slope. I got
    seduced into adding it as an option to "smooth" because
    "smooth unique" was already calculating the numbers and then
    throwing them away.

    Maybe it would be better to separate the statistical
    analyses -- frequency, mean frequency (which is the correct
    name for what "smooth unique" really does), and potential
    other statistics -- from the smoothing options altogether.
    After all, what if I want to plot the frequency or mean
    frequency with a bezier -- I can't, because we're making
    "smooth" do both the statistical analysis and the curve
    fitting, and we only take one "smooth" option.

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    OK, so let's stick with the name being "frequency", but
    allow "freq" as an abbreviation. Easily done by using
    "freq$uency" as the matching string.

    As to splitting statistical things from smoothing options:
    I don't think that's really needed. "smooth" is just
    slightly wrongly named. "transform" would be more to the
    point. And it shure would be nice to allow for more than
    one occurence of the "transform" option per dataset. We
    already came up with that idea once, in discussion on
    info-gnuplot-beta.

    In the end of the day, though, the real stopping block is a
    design issue concerning Unix tools, in general: the Unix
    philosophy essentially says: one task -- one tool. In the
    light of this, we probably shouldn't turn gnuplot into a
    full-fledged number-crunching or data-manipulation program.
    There's awk and Matlab for that.

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    Oh--- one further thing: you really should switch over to
    the CVS version of gnuplot to base your patch on. There's no
    way we'll be adding any new features to the 3.7.1 line any
    more. 3.7 will only see a bug fix release, if any, still.

     
  • Todd M. Lewis

    Todd M. Lewis - 2001-07-19

    Logged In: YES
    user_id=29839

    Serious problems were lurking in the previous patch. It
    seemed to work
    fine until the Real World data had over 100 data points.
    Then it became
    obvious that the way "smooth freq" works is fundamentally
    different from
    the other smooth options.

    The attached patch does the Right Thing. It's simpler, even
    though there
    is less code sharing with the other smooth options. And it
    works. :-)

     
  • Todd M. Lewis

    Todd M. Lewis - 2001-07-19

    corrected(*2) 'smooth freq' patch to 3.7.1

     
  • Todd M. Lewis

    Todd M. Lewis - 2001-08-07

    Logged In: YES
    user_id=29839

    The attached patch adds a "smooth frequency" option to
    "plot"
    (with "frequency" spelled out this time instead of just
    "freq")
    to gnuplot 3.8g. Otherwise it behaves the same as the
    "smooth freq" patch reletive to 3.7.1.
    Cheers.

     
  • Todd M. Lewis

    Todd M. Lewis - 2001-08-07

    Logged In: YES
    user_id=29839

    Hmm. Ya gotta click that check box or the file doesn't
    upload. :-( Here is the "smooth frequency" patch to 3.8g.
    (I checked the box this time.)

     
  • Todd M. Lewis

    Todd M. Lewis - 2001-08-07

    "smooth frequency" patch to 3.8g

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    I comitted a (slightly modified) version of the 3.8g patch
    into the main CVS sources. I'll close the patch, with status
    'accepted', even though the 3.7.1 patches won't ever make it
    into CVS.

     
  • Hans-Bernhard Broeker

    • status: open --> closed-accepted
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks