I was out all day so I am joining this thread a little later.
>
> I had occasion to use your patch in the real world yesterday.
> I was making figures from data stored in a csv file, with 16 columns
> of data each corresponding to one experiment. I wanted to normalize
> the curves so that each one filled the range [0:1] when superimposed.
Thanks for giving it a "real-world whirl".
>
> set datafile separator ','
> data = "My Data File"
>
> stats data using 5 variable="col5"
> stats data using 6 variable="col6"
> ...
> stats data using 20 variable="col20"
>
> plot ....
>
> All great.
> But having to type 16 separate commands and then hunt for the
> results to construct a single complicated plotting command was tedious.
> So I have several specific suggestions for improvement.
>
> 1) The syntax
> stats data using 20 variable="col20"
> did not do at all what I expected. I expected to get variables
> col20_max_x and so on,
> but what I actually got was variables with embedded quote marks:
> "col20"max_x
> Trying to embed this in a plot command was a pain.
The quotes are your addition. We don't expect them,
but we don't actively remove them. Maybe we should -
I admit that there is a user expectation that "there should
be quotes". (I know, because I found myself adding them.)
On the other hand, I don't want to have a command that is
too smart (removing quotes silently, but leaving everything
else alone...)
>
> I suggest that the syntax should be
> stats data using <foo> name <string>
> and should produce variables named
> GPSTAT_string_max_x
> GPSTAT_string_npoints (NB: not ...nrecords)
> Except that I really don't like this mechanism very much.
Zortan and I discussed this. Our idea was to keep the
variable names short and user-friendly. The GPSTAT_
prefix really does not help very much. If users want to
avoid polluting their own namespace, we give them the
ability to choose their own prefixes.
>
> For one thing, we don't currently have any easy way to undefine all
> these variables. "show var GPSTAT" would show all of them, but
> "undefine GPSTAT" doesn't get rid of them. That's fixable, I suppose,
> but I don't like the variables for another reason...
>
> 2) Here's the command I want to issue in the end:
>
> plot for [col=5:20] data using (column(4)) : (column(col) /
> statmax(col)) \ title label(col)
>
> I doesn't work, because I can't figure out how to define a function
> statmax(col) that retrieves the desired value. We don't have a user-level
> command in gnuplot that will retrieve the value of a gnuplot variable by
> its string name. Yesterday I had to forego the iterator and type in a
> 16-line plot command instead.
If I see this correctly, mostly you would like to add
support for iteration into the stats command? This
is certainly something we can think about.
>
> So I want to request a different mechanism for storing and retrieving the
> stats values. You've seen this before, but here it comes again:
> I don't want dozens of variables to be created by every stats command,
> because they are too hard to retrieve inside a script. Instead I want each
> stats command to load a structure, and I want a set of functions that
> retrieve the previously calculated stats values, indexed by name. If you
> want to load a named variable from one of the stats values, fine. Just say
> Run5_xmin = statmin("Run5")
> That will persist across a save/load sequence, for instance, even though
> the internal stats structures will not.
Why is that goodness? I don't understand the
motivation here. Why do you want to go the
roundabout way (and force the user through
this detour) of accessing variables through
functions, rather than as variables?
The "prefix" that we offer for variable names
serves exactly the same purpose as the data
structure that you refer to: a logical grouping.
>
> 3) For convenience, the stats command should accept an iterator.
> My plots yesterday could then have been created in two commands:
>
> stats for [col=5:20] data using col name "Run".col
> plot for [col=5:20] data using (column(4)) : \
> (column(col) / statmax("Run".col)) \
> title sprintf("Run%d",col)
>
> Note that "Run".col is the same as sprintf("Run%d",col).
>
> 4) I suggest adding a mechanism for explicitly clearing out the set of
> stats calculations. One obvious syntax is
> reset stats
Yes, and in fact Zoltan and I have implemented a
function to do that, but not hooked it up.
|