From: Philipp K. J. <ja...@ie...> - 2009-11-08 20:18:03
|
Here is a recurrent problem I often have: Frequently, I would like to plot some data after subtracting off the mean. Or I would like to normalize the variation by dividing with the standard deviation. Or I'd like to form a normalized histogram, by passing the number of records in the data set to "smooth frequency". Currently, I always have to invoke an external program to find these quantities, which is always a little inconvenient in interactive use, and pretty painful when scripting gnuplot. When not on Unix, it may be quite difficult! For the last few weeks, Zoltán and I have been working on a command that calculates the most important such quantities from a data file, displays them, and (optionally) assigns them to variables in the current gnuplot session. You can see some examples of what you can do with this command here: http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats.html and you can read the full documentation here: http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats_help.html I uploaded a patch with our changes to sourceforge. We'd like to hear feedback and suggestions. Is this useful? Are we missing anything? We'd also like to encourage everyone to build the patch and play with it - each additional user finds a new class of bugs! Best, Ph. |
From: Ethan M. <merritt@u.washington.edu> - 2009-11-09 05:12:35
|
On Sunday 08 November 2009, Philipp K. Janert wrote: > We'd like to hear feedback and suggestions. Is this > useful? Are we missing anything? - I don't know how to interpret this behaviour: gnuplot> stats '-' using (1) input data ('e' ends) > 1 input data ('e' ends) > 2 input data ('e' ends) > NaN input data ('e' ends) > 4 input data ('e' ends) > 5 * FILE: Records: 4 Invalid: 0 Blank: 0 Data Blocks: 1 Why is the NaN not listed as either "invalid" or "blank"? Same thing happens for '?' or Inf or junk. I don't think this is being tracked correctly. NB: We exchanged Email off-list about making df_readline() more consistent about returning DF_UNDEFINED, DF_MISSING, and so on. Totally aside from keeping statistics, does anyone object to making plot 'foo' plot 'foo' using 1:2 plot 'foo' using ($1):($2) all consistently return DF_MISSING and DF_UNDEFINED? Right now all three behave differently. |
From: Philipp K. J. <ja...@ie...> - 2009-11-09 06:07:04
|
On Sunday 08 November 2009 09:11:58 pm Ethan Merritt wrote: > On Sunday 08 November 2009, Philipp K. Janert wrote: > > We'd like to hear feedback and suggestions. Is this > > useful? Are we missing anything? > I think this is due to the "using (1)" - it should be ($1). When I do this with "using ($1)", the stats command reports on the invalid entry as it should. > - I don't know how to interpret this behaviour: > gnuplot> stats '-' using (1) > input data ('e' ends) > 1 > input data ('e' ends) > 2 > input data ('e' ends) > NaN > input data ('e' ends) > 4 > input data ('e' ends) > 5 > > * FILE: > Records: 4 > Invalid: 0 > Blank: 0 > Data Blocks: 1 > > Why is the NaN not listed as either "invalid" or "blank"? > Same thing happens for '?' or Inf or junk. > I don't think this is being tracked correctly. > > NB: We exchanged Email off-list about making df_readline() more consistent > about returning DF_UNDEFINED, DF_MISSING, and so on. Totally aside from > keeping statistics, does anyone object to making > plot 'foo' > plot 'foo' using 1:2 > plot 'foo' using ($1):($2) > all consistently return DF_MISSING and DF_UNDEFINED? > Right now all three behave differently. I completely agree. I would be great if readline was behaving consistently. But we did not feel confident to make changes to a routine that is so central. |
From: Ethan M. <merritt@u.washington.edu> - 2009-11-09 05:14:34
|
On Sunday 08 November 2009, Philipp K. Janert wrote: > We'd like to hear feedback and suggestions. Is this > useful? Are we missing anything? Some first thoughts: The behaviour for functions is not as obvious as for files of data points. For example: gnuplot> set xrange [0:10] gnuplot> stats '+' using 1:(sin($1)) * FILE: Records: 100 * COLUMNS: Mean: 5.0000 0.1792 Minimum: 0.0000 [ 1] -0.9994 [ 48] Quartile: 2.5253 [ 26] -0.3837 [ 36] Median: 5.0505 [ 51] 0.3082 [ 29] Quartile: 7.5758 [ 76] 0.8075 [ 85] Maximum: 10.0000 [100] 0.9997 [ 79] gnuplot> set samples 1000 gnuplot> stats '+' using 1:(sin($1)) * FILE: Records: 1000 * COLUMNS: Mean: 5.0000 0.1835 Minimum: 0.0000 [ 1] -1.0000 [ 472] Quartile: 2.5025 [ 251] -0.3941 [ 983] Median: 5.0050 [ 501] 0.3149 [ 33] Quartile: 7.5075 [ 751] 0.8113 [ 848] Maximum: 10.0000 [1000] 1.0000 [ 158] I find several things disconcerting about this output, although I know what the underlying causes are. - The min/max are artifacts of the sampling. They're not even symmetric even though sin(x) is a symmetric function You can reduce the problem by increasing the number of samples, but I think more drastic alternatives should be considered 1) The 'stats' command could refuse to operate on functions 2) The 'stats' command could temporarily bump up the sampling rate by 100x 3) The 'stats' command could do a systematic search in the area of the nominal extrema to determine more accurate values. Even so, if the sampling is too coarse it may miss a true extremum that lies elsewhere. - The "mean" of a periodic function would normally be calculated over one period of the function rather than an arbitrary range. Yeah, I know, I gave an explicit xrange. But still... - The quantities in [] are documented as "the" point at which the min/max/whatever occurs. But there is no expectation for either data or functions that the minimum, for example, is only acheived only at a single point. I don't think it makes any sense to give these values unless the data or function is monotonic. And given sampling artifacts, it probably makes no sense to give them for function data at all. |
From: Philipp K. J. <ja...@ie...> - 2009-11-09 05:26:43
|
On Sunday 08 November 2009 09:14:16 pm Ethan Merritt wrote: > On Sunday 08 November 2009, Philipp K. Janert wrote: > > We'd like to hear feedback and suggestions. Is this > > useful? Are we missing anything? > > Some first thoughts: Thanks for checking it out! > > The behaviour for functions is not as obvious as for files of > data points. For example: I am not sure. I find this a little unfair. There is no claim that the stats command does function minimization. It finds the extrema in the data sets passed to it. And that it does correctly, I think. (Even in the example given below.) Let me state it again: the stats command works on data sets. Not functions (in the analytic sense). I don't think it would be reasonable to expect anything else. Regarding "the" min/max : you are right, the documentation could be clearer. If there are multiple points in a data set, all of which are of the same (minimal) value, then the stats command currently makes no guarantee for which of those points it will report the position in the file. It will just report the position of one of them. > > gnuplot> set xrange [0:10] > gnuplot> stats '+' using 1:(sin($1)) > > * FILE: > Records: 100 > > * COLUMNS: > Mean: 5.0000 0.1792 > Minimum: 0.0000 [ 1] -0.9994 [ 48] > Quartile: 2.5253 [ 26] -0.3837 [ 36] > Median: 5.0505 [ 51] 0.3082 [ 29] > Quartile: 7.5758 [ 76] 0.8075 [ 85] > Maximum: 10.0000 [100] 0.9997 [ 79] > > gnuplot> set samples 1000 > gnuplot> stats '+' using 1:(sin($1)) > > * FILE: > Records: 1000 > > * COLUMNS: > Mean: 5.0000 0.1835 > Minimum: 0.0000 [ 1] -1.0000 [ 472] > Quartile: 2.5025 [ 251] -0.3941 [ 983] > Median: 5.0050 [ 501] 0.3149 [ 33] > Quartile: 7.5075 [ 751] 0.8113 [ 848] > Maximum: 10.0000 [1000] 1.0000 [ 158] > > I find several things disconcerting about this output, although > I know what the underlying causes are. > > - The min/max are artifacts of the sampling. > They're not even symmetric even though sin(x) is a symmetric function > You can reduce the problem by increasing the number of samples, but I > think more drastic alternatives should be considered > > 1) The 'stats' command could refuse to operate on functions > 2) The 'stats' command could temporarily bump up the sampling rate > by 100x > 3) The 'stats' command could do a systematic search in the area of > the nominal extrema to determine more accurate values. > Even so, if the sampling is too coarse it may miss a true extremum > that lies elsewhere. > > - The "mean" of a periodic function would normally be calculated over > one period of the function rather than an arbitrary range. > Yeah, I know, I gave an explicit xrange. But still... > > - The quantities in [] are documented as "the" point at which the > min/max/whatever occurs. But there is no expectation for either data > or functions that the minimum, for example, is only acheived only at a > single point. I don't think it makes any sense to give these values > unless the data or function is monotonic. And given sampling artifacts, > it probably makes no sense to give them for function data at all. |
From: Tait <gnu...@t4...> - 2009-11-09 11:11:51
|
> http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats.html Forgive me for not actually looking at the code (yet), but a couple questions sprang immediately to mind. Mean as used here seems to be the arithmetic mean. What about the geometric mean? (Or harmonic mean, or any of the other types of averages?) Is the standard deviation calculated as for a population, or a sample? Is there a way for a user to request the other? I have always addressed these sorts of issues by using a tool that's designed for manipulating arbitrary data sets (Perl in my case, but there are others). This has the advantage of providing infinite flexibility, subroutines, complex logical conditions, modules or libraries, abstraction and re-use, and all those things that gnuplot doesn't have. The external tool then generates output that's fed to gnuplot. I wonder, rather than providing a restricted set of pre-defined functions, is there a way to allow the user to provide a formula or expression that will be applied across multiple rows? Then the user could calculate the mean (whatever that means to their application) or standard deviation or some other arbitrary metric on their own. Tait |
From: Philipp K. J. <ja...@ie...> - 2009-11-09 15:27:41
|
[snip] > > I have always addressed these sorts of issues by using a tool that's > designed for manipulating arbitrary data sets (Perl in my case, but there > are others). This has the advantage of providing infinite flexibility, > subroutines, complex logical conditions, modules or libraries, abstraction > and re-use, and all those things that gnuplot doesn't have. The external > tool then generates output that's fed to gnuplot. So do I, and I have found that there is a small set of properties I calculate much more often than others. Those are pretty much those included in out stats command. The idea here is to provide a convenience for the 80% case. If I want to do something "fancy", or just anything that is special or unique to one particular data set, I think it is much more appropriate to hack that up as an external Perl script. > > I wonder, rather than providing a restricted set of pre-defined functions, > is there a way to allow the user to provide a formula or expression that > will be applied across multiple rows? Then the user could calculate the > mean (whatever that means to their application) or standard deviation or > some other arbitrary metric on their own. That is an interesting suggestion, but obviously goes in scope way beyond what we were attempting here. And I am not sure that I would want to see that in gnuplot. Gnuplot's strength is that it is JUST a plotting tool. Because of that, it is simple and straightforward (no need to deal with a command language). If you want statistical functions with general programming capabilities AND graphics, use R. ;-) > > Tait > > > --------------------------------------------------------------------------- >--- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day trial. Simplify your report design, integration and deployment - and > focus on what you do best, core application coding. Discover what's new > with Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > gnuplot-beta mailing list > gnu...@li... > https://lists.sourceforge.net/lists/listinfo/gnuplot-beta |
From: Philipp K. J. <ja...@ie...> - 2009-11-09 15:22:24
|
Based on some of the comments regarding our suggested "stats" command, I think I need to clarify something. There was a question whether the stats command works on "a sample or a population". Neither! It works on a a DATA FILE, like the rest of gnuplot. (And therefore there is no ambiguity.) Gnuplot knows nothing about sampling or statistical methods - it is a plotting tool. And this command is intended as an addition to gnuplot's plotting capabilities by giving you some useful information about the file that you are plotting from. Best, Ph. |
From: Jonathan T. <jt...@as...> - 2009-11-09 15:49:05
|
On Mon, 9 Nov 2009, Philipp K. Janert wrote: > There was a question whether the stats command > works on "a sample or a population". Neither! > It works on a a DATA FILE, like the rest of gnuplot. > (And therefore there is no ambiguity.) I'm sorry, but I still don't know whether this means it uses N or N-1 weighting. I would find it useful to have *both* printed out -- each is valuable in different circumstances. -- -- "Jonathan Thornburg [remove -animal to reply]" <jt...@as...> Dept of Astronomy, Indiana University, Bloomington, Indiana, USA "C++ is to programming as sex is to reproduction. Better ways might technically exist but they're not nearly as much fun." -- Nikolai Irgens |
From: Philipp K. J. <ja...@ie...> - 2009-11-09 15:50:11
|
On Monday 09 November 2009 07:29:10 am Jonathan Thornburg wrote: > On Mon, 9 Nov 2009, Philipp K. Janert wrote: > > There was a question whether the stats command > > works on "a sample or a population". Neither! > > It works on a a DATA FILE, like the rest of gnuplot. > > (And therefore there is no ambiguity.) > > I'm sorry, but I still don't know whether this means it uses N or N-1 > weighting. I would find it useful to have *both* printed out -- each is > valuable in different circumstances. I see. That makes more sense. Currently, the stats command divides by N. |
From: Ethan M. <merritt@u.washington.edu> - 2009-11-10 20:35:26
|
On Sunday 08 November 2009 12:17:50 Philipp K. Janert wrote: > For the last few weeks, Zoltán and I have been > working on a command that calculates the most > important such quantities from a data file, displays > them, and (optionally) assigns them to variables in > the current gnuplot session. > > You can see some examples of what you can do > with this command here: > http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats.html > and you can read the full documentation here: > http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats_help.html > > I uploaded a patch with our changes to sourceforge. > > We'd like to hear feedback and suggestions. A nice starting point for development. > Is this useful? Are we missing anything? I had occasion to use your patch in the real world yesterday. I was making figures from data stored in a csv file, with 16 columns of data each corresponding to one experiment. I wanted to normalize the curves so that each one filled the range [0:1] when superimposed. set datafile separator ',' data = "My Data File" stats data using 5 variable="col5" stats data using 6 variable="col6" ... stats data using 20 variable="col20" plot .... All great. But having to type 16 separate commands and then hunt for the results to construct a single complicated plotting command was tedious. So I have several specific suggestions for improvement. 1) The syntax stats data using 20 variable="col20" did not do at all what I expected. I expected to get variables col20_max_x and so on, but what I actually got was variables with embedded quote marks: "col20"max_x Trying to embed this in a plot command was a pain. I suggest that the syntax should be stats data using <foo> name <string> and should produce variables named GPSTAT_string_max_x GPSTAT_string_npoints (NB: not ...nrecords) Except that I really don't like this mechanism very much. For one thing, we don't currently have any easy way to undefine all these variables. "show var GPSTAT" would show all of them, but "undefine GPSTAT" doesn't get rid of them. That's fixable, I suppose, but I don't like the variables for another reason... 2) Here's the command I want to issue in the end: plot for [col=5:20] data using (column(4)) : (column(col) / statmax(col)) \ title label(col) I doesn't work, because I can't figure out how to define a function statmax(col) that retrieves the desired value. We don't have a user-level command in gnuplot that will retrieve the value of a gnuplot variable by its string name. Yesterday I had to forego the iterator and type in a 16-line plot command instead. So I want to request a different mechanism for storing and retrieving the stats values. You've seen this before, but here it comes again: I don't want dozens of variables to be created by every stats command, because they are too hard to retrieve inside a script. Instead I want each stats command to load a structure, and I want a set of functions that retrieve the previously calculated stats values, indexed by name. If you want to load a named variable from one of the stats values, fine. Just say Run5_xmin = statmin("Run5") That will persist across a save/load sequence, for instance, even though the internal stats structures will not. 3) For convenience, the stats command should accept an iterator. My plots yesterday could then have been created in two commands: stats for [col=5:20] data using col name "Run".col plot for [col=5:20] data using (column(4)) : \ (column(col) / statmax("Run".col)) \ title sprintf("Run%d",col) Note that "Run".col is the same as sprintf("Run%d",col). 4) I suggest adding a mechanism for explicitly clearing out the set of stats calculations. One obvious syntax is reset stats -- Ethan A Merritt Biomolecular Structure Center University of Washington, Seattle 98195-7742 |
From: Hans-Bernhard B. <HBB...@t-...> - 2009-11-11 03:49:33
|
Ethan Merritt wrote: > 2) Here's the command I want to issue in the end: > > plot for [col=5:20] data using (column(4)) : (column(col) / statmax(col)) \ > title label(col) > > I doesn't work, because I can't figure out how to define a function statmax(col) > that retrieves the desired value. We don't have a user-level command in gnuplot > that will retrieve the value of a gnuplot variable by its string name. Maybe more to the point, we lack array variables, which are exactly what this really would call for. Collections you loop over to retrieve individual elements by numbers are just that: arrays. Whether they be emulated by fancy variable name construction from fragments, or a function taking the index as an argument, they're still just arrays. OTOH, arrays (a.k.a. vectors, and eventually matrices) would be one more step towards mimicking MatLab. Which we used to say we weren't going to do. Maybe all those quirks popping up are to warn us that this is not a direction we should continue walking in. |
From: Hans-Bernhard B. <HBB...@t-...> - 2009-11-10 21:00:40
|
Ethan Merritt wrote: [...] > For one thing, we don't currently have any easy way to undefine all > these variables. "show var GPSTAT" would show all of them, but > "undefine GPSTAT" doesn't get rid of them. That's fixable, I suppose, > but I don't like the variables for another reason... > 4) I suggest adding a mechanism for explicitly clearing out the set of > stats calculations. One obvious syntax is > reset stats This one deserves generalization. Unless I missed something, we currently don't have a command to get rid of any variable (short of 'exit' and starting a new session), whereas the number of variables we have gnuplot create by itself seems to be increasing all the time (first "fit" results, then GPVAL_*, now possibly GPSTAT_*). I think a generic unset variable {<name>| pattern <regex> | fit | stats} command would be in order. I would prefer 'unset' over 'reset' here because it matches 'show variables' a bit better than 'reset'. It should probably be extended to user-defined functions, too. And maybe we should even allow set variable <var>=<expr> and set function <name>(<arguments>)=<expression> as an optional syntax instead of the usual <var>=<expr> etc., too. |
From: Philipp K. J. <ja...@ie...> - 2009-11-11 00:54:19
|
On Tuesday 10 November 2009 01:00:16 pm Hans-Bernhard Bröker wrote: > Ethan Merritt wrote: > [...] > > > For one thing, we don't currently have any easy way to undefine all > > these variables. "show var GPSTAT" would show all of them, but > > "undefine GPSTAT" doesn't get rid of them. That's fixable, I suppose, > > but I don't like the variables for another reason... > > > > 4) I suggest adding a mechanism for explicitly clearing out the set of > > stats calculations. One obvious syntax is > > reset stats > > This one deserves generalization. Unless I missed something, we > currently don't have a command to get rid of any variable (short of > 'exit' and starting a new session), whereas the number of variables we > have gnuplot create by itself seems to be increasing all the time (first > "fit" results, then GPVAL_*, now possibly GPSTAT_*). > > I think a generic > > unset variable {<name>| pattern <regex> | fit | stats} > I fully agree. Zoltan and I discussed this and even implemented a function that will do that. We just don't currently expose it as a command - partially because we did not want to add yet another user-level command. But the idea of overloading unset for this purpose is a good one. > command would be in order. I would prefer 'unset' over 'reset' here > because it matches 'show variables' a bit better than 'reset'. It > should probably be extended to user-defined functions, too. And maybe > we should even allow > > set variable <var>=<expr> > and > set function <name>(<arguments>)=<expression> > > as an optional syntax instead of the usual <var>=<expr> etc., too. > > > --------------------------------------------------------------------------- >--- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day trial. Simplify your report design, integration and deployment - and > focus on what you do best, core application coding. Discover what's new > with Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > gnuplot-beta mailing list > gnu...@li... > https://lists.sourceforge.net/lists/listinfo/gnuplot-beta |
From: ZoltánVörös <zv...@gm...> - 2009-11-09 20:35:20
|
Philipp K. Janert <janert <at> ieee.org> writes: > > On Monday 09 November 2009 07:29:10 am Jonathan Thornburg wrote: > > On Mon, 9 Nov 2009, Philipp K. Janert wrote: > > > There was a question whether the stats command > > > works on "a sample or a population". Neither! > > > It works on a a DATA FILE, like the rest of gnuplot. > > > (And therefore there is no ambiguity.) > > > > I'm sorry, but I still don't know whether this means it uses N or N-1 > > weighting. I would find it useful to have *both* printed out -- each is > > valuable in different circumstances. > > I see. That makes more sense. > > Currently, the stats command divides by N. I don't really see the difference: since the sum of whatever quantity you want is reported, as is the number of records, you can define either sum_x / records or sum_x / (records-1). Similar argument applies to the standard deviations and the like. I don't want to add new variables for quantities that can easily be calculated from existing ones. Best, Zoltán |
From: ZoltánVörös <zv...@gm...> - 2009-11-09 20:46:59
|
Tait <gnuplot-devel <at> t41t.com> writes: > > Mean as used here seems to be the arithmetic mean. What about the geometric > mean? (Or harmonic mean, or any of the other types of averages?) Harmonic mean is easily done with the present stats command as stats 'foo' u 1:(1.0/$2) noout var harmonic = records / sum_y > I have always addressed these sorts of issues by using a tool that's > designed for manipulating arbitrary data sets (Perl in my case, but there > are others). This has the advantage of providing infinite flexibility, That is sort of a platform-dependent solution for a problem that comes up too often, but Philipp has already discussed this issue at length, I believe. > I wonder, rather than providing a restricted set of pre-defined functions, > is there a way to allow the user to provide a formula or expression that > will be applied across multiple rows? Then the user could calculate the > mean (whatever that means to their application) or standard deviation or > some other arbitrary metric on their own. stats 'foo' u ($2*$3+cos($4)) should work as it is, if that is what you meant. A fairly large set of quantities can be calculated using the variables that are produced by stats, if the proper function is applied to the columns beforehand. Best, Zoltán |
From: Ethan M. <merritt@u.washington.edu> - 2009-11-10 21:32:26
|
On Tuesday 10 November 2009 13:00:16 Hans-Bernhard Bröker wrote: > This one deserves generalization. Unless I missed something, we > currently don't have a command to get rid of any variable (short of > 'exit' and starting a new session), Yeah, we do. You can undefine VARNAME But it doesn't take wildcards. Perhaps it should do the same as the "show var PREFIX" command, and treat the string as a leading prefix to the full variable name. > whereas the number of variables we > have gnuplot create by itself seems to be increasing all the time (first > "fit" results, then GPVAL_*, now possibly GPSTAT_*). > > I think a generic > > unset variable {<name>| pattern <regex> | fit | stats} > > command would be in order. I would prefer 'unset' over 'reset' here > because it matches 'show variables' a bit better than 'reset'. Hmm. But it isn't "set var foo", it's "var = foo". So "unset" is misleading. > It should probably be extended to user-defined functions, too. Yes. Good point. But there is a possible "gotcha". If you undefine a function that is called by another previously-defined function, bad things could happen. > And maybe > we should even allow > > set variable <var>=<expr> > and > set function <name>(<arguments>)=<expression> > > as an optional syntax instead of the usual <var>=<expr> etc., too. Yes, that would be the other way to justify use of "unset" :-) But it seems a more drastic change than extending "undefine". -- Ethan A Merritt |
From: Hans-Bernhard B. <HBB...@t-...> - 2009-11-11 04:20:18
|
Ethan Merritt wrote: > Hmm. But it isn't "set var foo", it's "var = foo". > So "unset" is misleading. But we _do_ "show var foo", so "{set|unset} var foo" would match the usual relation between show/set/unset rather more nicely than a new command of its own. > But there is a possible "gotcha". If you undefine a function that is > called by another previously-defined function, bad things could happen. None of those would be worse than the bad things already happening by *) never defining a variable used by some function in the first place *) never defining a function used by anther function in the first place *) undefining a variable that was referenced by some function *) never defining a variable used to set the value of another variable *) never defining a function called to set the value of a variable |
From: Philipp K. J. <ja...@ie...> - 2009-11-11 00:52:36
|
I was out all day so I am joining this thread a little later. > > I had occasion to use your patch in the real world yesterday. > I was making figures from data stored in a csv file, with 16 columns > of data each corresponding to one experiment. I wanted to normalize > the curves so that each one filled the range [0:1] when superimposed. Thanks for giving it a "real-world whirl". > > set datafile separator ',' > data = "My Data File" > > stats data using 5 variable="col5" > stats data using 6 variable="col6" > ... > stats data using 20 variable="col20" > > plot .... > > All great. > But having to type 16 separate commands and then hunt for the > results to construct a single complicated plotting command was tedious. > So I have several specific suggestions for improvement. > > 1) The syntax > stats data using 20 variable="col20" > did not do at all what I expected. I expected to get variables > col20_max_x and so on, > but what I actually got was variables with embedded quote marks: > "col20"max_x > Trying to embed this in a plot command was a pain. The quotes are your addition. We don't expect them, but we don't actively remove them. Maybe we should - I admit that there is a user expectation that "there should be quotes". (I know, because I found myself adding them.) On the other hand, I don't want to have a command that is too smart (removing quotes silently, but leaving everything else alone...) > > I suggest that the syntax should be > stats data using <foo> name <string> > and should produce variables named > GPSTAT_string_max_x > GPSTAT_string_npoints (NB: not ...nrecords) > Except that I really don't like this mechanism very much. Zortan and I discussed this. Our idea was to keep the variable names short and user-friendly. The GPSTAT_ prefix really does not help very much. If users want to avoid polluting their own namespace, we give them the ability to choose their own prefixes. > > For one thing, we don't currently have any easy way to undefine all > these variables. "show var GPSTAT" would show all of them, but > "undefine GPSTAT" doesn't get rid of them. That's fixable, I suppose, > but I don't like the variables for another reason... > > 2) Here's the command I want to issue in the end: > > plot for [col=5:20] data using (column(4)) : (column(col) / > statmax(col)) \ title label(col) > > I doesn't work, because I can't figure out how to define a function > statmax(col) that retrieves the desired value. We don't have a user-level > command in gnuplot that will retrieve the value of a gnuplot variable by > its string name. Yesterday I had to forego the iterator and type in a > 16-line plot command instead. If I see this correctly, mostly you would like to add support for iteration into the stats command? This is certainly something we can think about. > > So I want to request a different mechanism for storing and retrieving the > stats values. You've seen this before, but here it comes again: > I don't want dozens of variables to be created by every stats command, > because they are too hard to retrieve inside a script. Instead I want each > stats command to load a structure, and I want a set of functions that > retrieve the previously calculated stats values, indexed by name. If you > want to load a named variable from one of the stats values, fine. Just say > Run5_xmin = statmin("Run5") > That will persist across a save/load sequence, for instance, even though > the internal stats structures will not. Why is that goodness? I don't understand the motivation here. Why do you want to go the roundabout way (and force the user through this detour) of accessing variables through functions, rather than as variables? The "prefix" that we offer for variable names serves exactly the same purpose as the data structure that you refer to: a logical grouping. > > 3) For convenience, the stats command should accept an iterator. > My plots yesterday could then have been created in two commands: > > stats for [col=5:20] data using col name "Run".col > plot for [col=5:20] data using (column(4)) : \ > (column(col) / statmax("Run".col)) \ > title sprintf("Run%d",col) > > Note that "Run".col is the same as sprintf("Run%d",col). > > 4) I suggest adding a mechanism for explicitly clearing out the set of > stats calculations. One obvious syntax is > reset stats Yes, and in fact Zoltan and I have implemented a function to do that, but not hooked it up. |
From: Philipp K. J. <ja...@ie...> - 2009-11-11 00:57:57
|
On Tuesday 10 November 2009 01:32:06 pm Ethan Merritt wrote: > On Tuesday 10 November 2009 13:00:16 Hans-Bernhard Bröker wrote: > > This one deserves generalization. Unless I missed something, we > > currently don't have a command to get rid of any variable (short of > > 'exit' and starting a new session), > > Yeah, we do. You can > undefine VARNAME > But it doesn't take wildcards. Side question: is that behavior documented somewhere? I did not know about it, either. > > Perhaps it should do the same as the "show var PREFIX" command, > and treat the string as a leading prefix to the full variable name. That's what we were thinking. (I will admit that I am reluctant to implement a regular expression parser. It seems like overkill.) I tend to prefer "unset" over "reset" - simply because I am already used to "unset" taking arguments, whereas "reset" does not. > > > whereas the number of variables we > > have gnuplot create by itself seems to be increasing all the time (first > > "fit" results, then GPVAL_*, now possibly GPSTAT_*). > > > > I think a generic > > > > unset variable {<name>| pattern <regex> | fit | stats} > > > > command would be in order. I would prefer 'unset' over 'reset' here > > because it matches 'show variables' a bit better than 'reset'. > > Hmm. But it isn't "set var foo", it's "var = foo". > So "unset" is misleading. > > > It should probably be extended to user-defined functions, too. > > Yes. Good point. > > But there is a possible "gotcha". If you undefine a function that is > called by another previously-defined function, bad things could happen. > > > And maybe > > we should even allow > > > > set variable <var>=<expr> > > and > > set function <name>(<arguments>)=<expression> > > > > as an optional syntax instead of the usual <var>=<expr> etc., too. > > Yes, that would be the other way to justify use of "unset" :-) > But it seems a more drastic change than extending "undefine". |
From: Philipp K. J. <ja...@ie...> - 2009-11-11 02:12:24
|
> > > > Yeah, we do. You can > > undefine VARNAME > > But it doesn't take wildcards. > > > I tend to prefer "unset" over "reset" - simply > because I am already used to "unset" taking > arguments, whereas "reset" does not. > I will correct myself. If we already have the ability to undefine a variable, then we should use that, rather than introducing an additional method. How about: undefine foo* undefine *foo undefine foo*bar with the obvious meaning of the wildcard? |
From: Ethan M. <merritt@u.washington.edu> - 2009-11-11 02:12:49
|
On Tuesday 10 November 2009, Philipp K. Janert wrote: > > 1) The syntax > > stats data using 20 variable="col20" > > did not do at all what I expected. I expected to get variables > > col20_max_x and so on, > > but what I actually got was variables with embedded quote marks: > > "col20"max_x > > Trying to embed this in a plot command was a pain. > > The quotes are your addition. We don't expect them, > but we don't actively remove them. Maybe we should - > I admit that there is a user expectation that "there should > be quotes". Since 4.0 we support string variables. That means wherever a string is required in the input command, it is acceptable to provide a string constant, a string variable, or a string-valued function. So A = "mydata" stats "file.dat" using 1 variable=A must expand A to find "mydata", not use it as an unmarked constant. All strings should be parsed using the routine try_to_get_string(), which handles the three cases. Also, normal commands do not use = signs. So the source code should be something like: char *prefix= NULL; if (equals(c_token,"variable")) { c_token++; prefix = try_to_get_string(); } > > 2) Here's the command I want to issue in the end: > > > > plot for [col=5:20] data using (column(4)) : \ > > column(col) / statmax(col)) > > > > It doesn't work, because I can't figure out how to define a function > > statmax(col) that retrieves the desired value. We don't have a user-level > > command in gnuplot that will retrieve the value of a gnuplot variable by > > its string name. Yesterday I had to forego the iterator and type in a > > 16-line plot command instead. > > If I see this correctly, mostly you would like to add > support for iteration into the stats command? This > is certainly something we can think about. Iteration is a major reason. But the same problem arises whenever you have constructed the variable name via a script. How do you insert the value of that variable back into another command? > > > > So I want to request a different mechanism for storing and retrieving the > > stats values. You've seen this before, but here it comes again: > > I don't want dozens of variables to be created by every stats command, > > because they are too hard to retrieve inside a script. Instead I want each > > stats command to load a structure, and I want a set of functions that > > retrieve the previously calculated stats values, indexed by name. If you > > want to load a named variable from one of the stats values, fine. Just say > > Run5_xmin = statmin("Run5") > > That will persist across a save/load sequence, for instance, even though > > the internal stats structures will not. > > Why is that goodness? I don't understand the > motivation here. I thought this was something you wanted. You said gnuplot should not become statefull, which I take to mean that save/load should get you back to where you were without having to replay the whole history of commands. > Why do you want to go the > roundabout way (and force the user through > this detour) of accessing variables through > functions, rather than as variables? Because, as I noted above, you cannot currently access their value by name in a script. I agree there are other possible solutions to the problem. |
From: Philipp K. J. <ja...@ie...> - 2009-11-11 02:22:40
|
> > The quotes are your addition. We don't expect them, > > but we don't actively remove them. Maybe we should - > > I admit that there is a user expectation that "there should > > be quotes". > > Since 4.0 we support string variables. That means wherever a string > is required in the input command, it is acceptable to provide > a string constant, a string variable, or a string-valued function. > > So > A = "mydata" > stats "file.dat" using 1 variable=A > > must expand A to find "mydata", not use it as an unmarked constant. > All strings should be parsed using the routine try_to_get_string(), > which handles the three cases. > Also, normal commands do not use = signs. Good point on the string variable issue - I did not think of that. There is a reason for the equality sign, though: it indicates that the next token is the prefix - because we have chosen to make the prefix optional. The equality sign is a way of telling gnuplot that the next token is a prefix, not the next keyword. (I admit that the equality sign is unusual and I did hesitate a little. But it does provide a simple solution to this particular problem.) I don't want to make the prefix mandatory. For convenience, it seems that in many cases it won't be needed. |
From: Philipp K. J. <ja...@ie...> - 2009-11-11 02:31:34
|
On Tuesday 10 November 2009 06:22:17 pm Philipp K. Janert wrote: > > > The quotes are your addition. We don't expect them, > > > but we don't actively remove them. Maybe we should - > > > I admit that there is a user expectation that "there should > > > be quotes". > > > > Since 4.0 we support string variables. That means wherever a string > > is required in the input command, it is acceptable to provide > > a string constant, a string variable, or a string-valued function. Although there is still a confusion: you write "where a string is required" In this example, no string is required. A bareword is required. It is your assumption that it should be a string. (And it is now a discussion item whether it should be a string - in which case the string needs to be handled properly, admittedly.) > > > > So > > A = "mydata" > > stats "file.dat" using 1 variable=A > > > > must expand A to find "mydata", not use it as an unmarked constant. > > All strings should be parsed using the routine try_to_get_string(), > > which handles the three cases. > > Also, normal commands do not use = signs. > > Good point on the string variable issue - I did not > think of that. > > There is a reason for the equality sign, though: > it indicates that the next token is the prefix - > because we have chosen to make the prefix > optional. The equality sign is a way of telling > gnuplot that the next token is a prefix, not the > next keyword. > > (I admit that the equality sign is unusual and I did > hesitate a little. But it does provide a simple solution > to this particular problem.) > > I don't want to make the prefix mandatory. For > convenience, it seems that in many cases it won't > be needed. > > --------------------------------------------------------------------------- >--- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day trial. Simplify your report design, integration and deployment - and > focus on what you do best, core application coding. Discover what's new > with Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > gnuplot-beta mailing list > gnu...@li... > https://lists.sourceforge.net/lists/listinfo/gnuplot-beta |
From: Ethan M. <merritt@u.washington.edu> - 2009-11-11 03:11:06
|
On Tuesday 10 November 2009, Philipp K. Janert wrote: > > > > > > Since 4.0 we support string variables. That means wherever a string > > > is required in the input command, it is acceptable to provide > > > a string constant, a string variable, or a string-valued function. > > Although there is still a confusion: you write > "where a string is required" > > In this example, no string is required. Of course a string is required. It is going to become the first N characters of a longer string. How could it be anything other than a string itself? > A bareword > is required. It is your assumption that it should be > a string. (And it is now a discussion item whether > it should be a string - in which case the string needs > to be handled properly, admittedly.) > > > > > > > So > > > A = "mydata" > > > stats "file.dat" using 1 variable=A > > > > > > must expand A to find "mydata", not use it as an unmarked constant. > > > All strings should be parsed using the routine try_to_get_string(), > > > which handles the three cases. > > > Also, normal commands do not use = signs. > > > > Good point on the string variable issue - I did not > > think of that. > > > > There is a reason for the equality sign, though: > > it indicates that the next token is the prefix - > > because we have chosen to make the prefix > > optional. The equality sign is a way of telling > > gnuplot that the next token is a prefix, not the > > next keyword. Not following you here. The keyword itself can be optional - you don't have to provide a prefix. But if you include the keyword in your command, then the next token must be a string. Where does the = sign come in? [maybe "prefix" is a better keyword than "variable"] > > (I admit that the equality sign is unusual and I did > > hesitate a little. But it does provide a simple solution > > to this particular problem.) > > > > I don't want to make the prefix mandatory. For > > convenience, it seems that in many cases it won't > > be needed. So let it default to an empty string. |