Thread: Re: [ gnuplot-Patches-992149 ] String variables revisited

A portable, multi-platform, command-line driven graphing utility

Brought to you by: broeker, cgaylord, lhecking, sfeam

gnuplot-beta

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan M. <merritt@u.washington.edu> - 2004-07-16 16:35:31

I'm moving this to the mailing list, because I hate the format
of Email sent via the SourceForge patch site.

On Friday 16 July 2004 02:22 am, Harald Harders wrote:
> Ethan Merritt wrote in the summary for patchset #992149
>
> internal.c internal.h
> =================
> 1) Define a new internal function f_sprintf(). This is
> visible to the user as a new built-in function
> sprintf("fmt",...), which is the first, and so far the
> only, function in gnuplot that accepts string variables
> as arguments.

> I think this patch looks like a good thing, but I really
> don't like the name `sprintf'. It really sounds like a c
> programmer had no good idea how to call it.

I was not clear enough.  This *is* the C language
sprintf routine [*].  The gnuplot code just collects the 
variables and the format and passes them along to
the C library. The documentation will refer users to
"man sprintf" or to any C language manual. 

> set xlabel string(" %d %d %d", 1,2,3)
> 
> The sprintf (or string) command should really understand all
> gnuplot formats (%t, %T, %l, %L, etc.).

For me, part of the rationale for this work is that the gnuplot
formats are limiting.  If there is some particular format that
you cannot produce using a C language printf() variant, then
we can provide a separate routine for that.  Or there could
be a second formatting function that specifically uses only
the non-C gnuplot format conversion specifiers.

    mystring = string("Format using %T %L etc", var1, var2)
    set title sprint("C Format with embedded %s", mystring)

[*] Actually, it's snprintf() because otherwise there is little
hope of preventing all buffer overflows.  As I recall, there is
an issue with some platforms not providing snprint().  My
inclination is to say that string variables are not supported
on such platforms.

-- 
Ethan A Merritt       merritt@u.washington.edu
Biomolecular Structure Center
Mailstop 357742
University of Washington, Seattle, WA 98195

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Hans-Bernhard B. <br...@ph...> - 2004-07-16 22:21:16

On Fri, 16 Jul 2004, Harald Harders wrote:

> I would prefer one function that does everything. 

I don't think that's an option.  Too many of gnuplot's special
formatting specifier already collide with C printf formats for that to
work:  %s, %c, %p, %l.

-- 
Hans-Bernhard Broeker (br...@ph...)
Even if all the snow were burnt, ashes would remain.

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Harald H. <h.h...@tu...> - 2004-07-17 13:43:34

On Sat, 17 Jul 2004, Hans-Bernhard Broeker wrote:

> On Fri, 16 Jul 2004, Harald Harders wrote:
>
> > I would prefer one function that does everything.
>
> I don't think that's an option.  Too many of gnuplot's special
> formatting specifier already collide with C printf formats for that to
> work:  %s, %c, %p, %l.

Mmh, you are right here. Nevertheless I think it is more important to
provide the format specifiers the gnuplot users are used to than to
provide C specifiers. If you are able to write "%l \267 10^{%L}" in tic
formats, you also should be able to do this in string variable
definitions.

Yours
Harald

--
Harald Harders                           Langer Kamp 8
Technische Universitaet Braunschweig     D-38106 Braunschweig
Institut fuer Werkstoffe                 Germany
E-Mail: h.h...@tu...               Tel: +49 (5 31) 3 91-3062
WWW   : http://www.ifw.tu-bs.de          Fax: +49 (5 31) 3 91-3058

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Hans-Bernhard B. <br...@ph...> - 2004-07-17 17:01:08

On Sat, 17 Jul 2004, Harald Harders wrote:

> provide the format specifiers the gnuplot users are used to than to
> provide C specifiers. If you are able to write "%l \267 10^{%L}" in tic
> formats, you also should be able to do this in string variable
> definitions.

I fully agree there.  Which means we'll need two functions, eventually.
The problem is that %l/%L and similar formats must always be used in
pairs, and the code needs to know which the pairs are: the rounding on the
%l part affects what the right result on %L is.  The number 9.999 can come
out as 9.999*10^0 or 10.0*10^^1.

The two of them could be called gprintf and sprintf or similar.

-- 
Hans-Bernhard Broeker (br...@ph...)
Even if all the snow were burnt, ashes would remain.

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan A M. <merritt@u.washington.edu> - 2004-07-18 06:32:01

On Saturday 17 July 2004 10:00 am, Hans-Bernhard Broeker wrote:
>
> The problem is that %l/%L and similar formats must always be used in
> pairs, and the code needs to know which the pairs are: the rounding on the
> %l part affects what the right result on %L is.  The number 9.999 can come
> out as 9.999*10^0 or 10.0*10^^1.
>
> The two of them could be called gprintf and sprintf or similar.

I have posted a 2nd patch, stringvars-2, to SourceForge.
stringvars-1 and stringvars-2 are to be applied sequentially.

This one adds run-time evaluation of all quoted strings beginning
"sprintf... that are printed via write_multiline().   It turned out to be
amazingly easy. I am very impressed with the existing implementation
of expression evaluation; slotting in evaluation of string-valued functions
"just worked".

Here's a neat example that demonstrates plot-time evaluation:

	set title 'sprintf("Plotted at %s",`date`)'
	plot <something>
	pause 3  "Should show new time"
	replot
	pause 3  "Should show new time"
	replot

NB: The specific placement of single and double quotes is critical
for this to work.  See my recent bug report about gnuplot losing the
single-quoted-ness of strings after their initial evaluation.  This
example currently works by accident, but I think we should
re-examine how quoted strings are stored in general.

Anyhow, adding additional string-valued functions would be very easy.
Let's discuss which ones might be desirable.

1) gprintf("format",mantissa,exponent)
    Is that the form it should take?

2) Some way to do arithmetic using numerical values stored in 
    a string.   E.g.
    a = "1.2"
    b = 3.4
    c = a+b
    It would be straightforward, although tedious, to modify every
    existing atomic evaluation routine in internal.c so that it 
    recognizes string-values during arithmetic.  They would be
    converted to (double) using atof(). But is this at all necessary?
    Maybe it is sufficient to simply provide a built-in atof()
    function.   Or maybe we don't even need that.

3) user-defined string-valued functions.
    I don't see at the moment how to implement these, although
    I think it would be possible.  Are they needed?

4) Do we want any string operations besides concatenation?
    Substrings?  String comparison?  

-- 
Ethan A Merritt
Department of Biochemistry & Biomolecular Structure Center
University of Washington, Seattle

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Hans-Bernhard B. <br...@ph...> - 2004-07-18 14:03:48

On Sat, 17 Jul 2004, Ethan A Merritt wrote:

> Anyhow, adding additional string-valued functions would be very easy.
> Let's discuss which ones might be desirable.
> 
> 1) gprintf("format",mantissa,exponent)
>     Is that the form it should take?

No.  It should be gprintf("format", number).  The crucial difference
is that C's sprintf() supports multiple % formats and uses up exactly
one argument per format specifier, whereas gprintf() only ever has one
argument, but may use more than one format specifier with it.  The
syntax of my stop-gap extension to 'set lable' looks the way it does for
a reason:

	set label 'a = %l * 10^{%L}', a, ', length = %s %cm', length \
	    at 2,3

> 2) Some way to do arithmetic using numerical values stored in 
>     a string.   E.g.
>     a = "1.2"
>     b = 3.4
>     c = a+b
>     It would be straightforward, although tedious, to modify every
>     existing atomic evaluation routine in internal.c so that it 
>     recognizes string-values during arithmetic.  They would be
>     converted to (double) using atof(). But is this at all necessary?

I don't think so.  I think we should follow a Java-like approach:
adding a number to a string converts the number to a string and 
concatenates.  The only way to get a number back from a string
would be by explicit function call.  

> 3) user-defined string-valued functions.
>     I don't see at the moment how to implement these, although
>     I think it would be possible.  Are they needed?

I think they are.  Users will almost certainly want to be able to
do something like this:

	filename(i)=sprintf("foobar%d.ps", i)

	i=15

	# in a loaded file:
	set output filename(i)
	plot something
	i=i+1
	reread	

> 4) Do we want any string operations besides concatenation?
>     Substrings?  String comparison?  

Yes, yes, and possibly, in that order.

-- 
Hans-Bernhard Broeker (br...@ph...)
Even if all the snow were burnt, ashes would remain.

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan A M. <merritt@u.washington.edu> - 2004-07-18 17:52:10

On Sunday 18 July 2004 07:03 am, Hans-Bernhard Broeker wrote:
> > 3) user-defined string-valued functions.
>
> Users will almost certainly want to be able to do something like this:
>
> 	filename(i)=sprintf("foobar%d.ps", i)
>
> 	i=15
>
> 	# in a loaded file:
> 	set output filename(i)
> 	plot something
> 	i=i+1
> 	reread

But that example does not require a user-defined function.
That is the behaviour you would get anyway, courtesy of
the automagic string evaluation code already written.

filename = 'sprintf("foobar%d.ps",i)'
set output filename
i = 15
replot
i = 16
replot

What it does require (and your user-defined function would
also require) is that term.c:term_set_output() and a few other 
places in the actual drivers be modified similarly to what I did for
write_multiline(). They would need to check that the string is
not really a constant, but instead holds a sprintf() command. 

'set output <bar>' is messy because <foo> is used for something
other than being printed. Most of the other 'set <foo> "string const"'
require no special modification, since the fancy stuff happens inside
the eventual print routine. 

-- 
Ethan A Merritt
Department of Biochemistry & Biomolecular Structure Center
University of Washington, Seattle

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Dave D. <dde...@es...> - 2004-07-21 10:38:27

Ethan A Merritt <merritt@u.washington.edu> writes:

> On Sunday 18 July 2004 07:03 am, Hans-Bernhard Broeker wrote:
>> > 3) user-defined string-valued functions.
>>
>> Users will almost certainly want to be able to do something like this:
>>
>> 	filename(i)=sprintf("foobar%d.ps", i)
>>
>> 	i=15
>>
> But that example does not require a user-defined function.
> That is the behaviour you would get anyway, courtesy of
> the automagic string evaluation code already written.
>
> filename = 'sprintf("foobar%d.ps",i)'


Sorry, I'm keeping up with this topic...

This may be a stupid question, but does this syntax extend to
arbitrary expressions, or just variable names ?

  filename = 'sprintf("foobar%d.ps", i+1)'

or

  maptofilename(i)=i+1
  filename = 'sprintf("foobar%d.ps", maptofilename(i))'

dd
-- 
Dave Denholm              <dde...@es...>       http://www.esmertec.com

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan M. <merritt@u.washington.edu> - 2004-07-21 15:30:41

On Wednesday 21 July 2004 03:37 am, Dave Denholm wrote:
> This may be a stupid question, but does this syntax extend to
> arbitrary expressions, or just variable names ?
>
>   filename = 'sprintf("foobar%d.ps", i+1)'
> or
>   maptofilename(i)=i+1
>   filename = 'sprintf("foobar%d.ps", maptofilename(i))'

Both of those would work. 
This is a new capability added to the existing expression evaluation
code.  It still handles everything it already handled, but now it
knows how to handle at least some operations on strings also.

-- 
Ethan A Merritt

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan A M. <merritt@u.washington.edu> - 2004-07-18 18:42:52

On Sunday 18 July 2004 07:03 am, Hans-Bernhard Broeker wrote:
> >
> > 1) gprintf("format",mantissa,exponent)
> >     Is that the form it should take?
>
> No.  It should be gprintf("format", number).  The crucial difference
> is that C's sprintf() supports multiple % formats and uses up exactly
> one argument per format specifier, whereas gprintf() only ever has one
> argument, but may use more than one format specifier with it.

Let me see if I understand this...

The user would type, for example
	set label gprintf("format",var)
and internally this would be converted into a call to the existing
function of the form
    gprintf( (char *)temp, sizeof(temp),
	(char *)format,
	(double)current_radix,
	(double)var);
followed by copying temp into the appropriate place, in this case
the label structure.

From the user's point of view (and the parser's), gprintf always has
exactly two parameters:  (char *)format and (double)var.

Is that correct?   

-- 
Ethan A Merritt
Department of Biochemistry & Biomolecular Structure Center
University of Washington, Seattle

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Hans-Bernhard B. <br...@ph...> - 2004-07-18 21:35:53

On Sun, 18 Jul 2004, Ethan A Merritt wrote:

> But that example does not require a user-defined function.
> That is the behaviour you would get anyway, courtesy of
> the automagic string evaluation code already written.
> 
> filename = 'sprintf("foobar%d.ps",i)'
> set output filename

If that indeed works, and recomputes the sprintf every time the 'set
output' command is reissued, then by comparison with existing gnuplot
machinery for user-defined objects, 'filename' is a function, not an
variable. So far, the gnuplot paradigm always was that variables have a
static value (unless they're the dummy in a plot command, i.e. 'x', 'y',
't', ...).

I.e. so far, variables stored values, not expressions to be evaluated at
some later time.  If at all possible, I'ld like to keep it that way for
strings, if only to minimize the amount of documentation we and the users
will need to fully explain all this.

> They would need to check that the string is
> not really a constant, but instead holds a sprintf() command. 

I'm opposed to having sprintf() *inside* the quotes.  It causes new
problems like the '' vs. "" saving issue that we don't really need, for no
real gain.

-- 
Hans-Bernhard Broeker (br...@ph...)
Even if all the snow were burnt, ashes would remain.

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan A M. <merritt@u.washington.edu> - 2004-07-18 23:14:16

> So far, variables stored values, not expressions to be evaluated at
> some later time.  If at all possible, I'ld like to keep it that way for
> strings, if only to minimize the amount of documentation we and the users
> will need to fully explain all this.

I disagree, and I think the record of requests for enhancement
support me on this point.  People really want a way of embedding 
variables into strings, and having the current value of the variable
substituted in at the time the string is printed.  It doesn't matter
whether you call that "re-evaluation of a string" or "storing a function
instead of a string", or "storing an expression to be evaluated later",
that is the desired capability.

> I'm opposed to having sprintf() *inside* the quotes.  It causes new
> problems like the '' vs. "" saving issue that we don't really need, for no
> real gain.

On the contrary, it's a huge gain.  It means that a very powerful
ability is introduced in a uniform way, yet requires minimal or no
change to the existing code or to the existing storage mechanisms
for strings.

Anything you do _outside_ the quotes means that every single place
that tests for a string constant has to be re-written to handle the
possible substitution of other syntactic entities instead.

But let me repeat again, I do not care what the exact quoting style
is.   If it saves confusion, I'm perfectly happy to add a 3rd quote character
besides ' and ", and reserve this new quote character for the case of
re-evaluation at plot time.   That would require changing is_quote(),
which is easy, and a lot of places that explicitly test for ' or ", which is
more annoying but certainly doable.

Assume for the moment we use % for this purpose. Then the
documentation would read:
	"<expression>"	evaluate immediately, with substitution
	'<expression>'		evaluate immediately, no substitution
	%<expression>%	evaluate later, with substitution at that time

-- 
Ethan A Merritt
Department of Biochemistry & Biomolecular Structure Center
University of Washington, Seattle

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Dave D. <dde...@es...> - 2004-07-21 10:44:49

Ethan A Merritt <merritt@u.washington.edu> writes:

>> So far, variables stored values, not expressions to be evaluated at
>> some later time.  If at all possible, I'ld like to keep it that way for
>> strings, if only to minimize the amount of documentation we and the users
>> will need to fully explain all this.
>
> I disagree, and I think the record of requests for enhancement
> support me on this point.  People really want a way of embedding 
> variables into strings, and having the current value of the variable
> substituted in at the time the string is printed.  It doesn't matter
> whether you call that "re-evaluation of a string" or "storing a function
> instead of a string", or "storing an expression to be evaluated later",
> that is the desired capability.
>

There is a demand for some way of making a plot with labels showing
the current value of a variable. I don't think this necessarily
translates into a need for dynamic substitution at plot time.
(But I think that point has already been made)

>> I'm opposed to having sprintf() *inside* the quotes.  It causes new
>> problems like the '' vs. "" saving issue that we don't really need, for no
>> real gain.
>
> On the contrary, it's a huge gain.  It means that a very powerful
> ability is introduced in a uniform way, yet requires minimal or no
> change to the existing code or to the existing storage mechanisms
> for strings.
>
> Anything you do _outside_ the quotes means that every single place
> that tests for a string constant has to be re-written to handle the
> possible substitution of other syntactic entities instead.
>

In any big program, there comes a time for refactoring. It seems to me
that, at the start of a development phase, the fact that a feature can
be inserted with minimal code impact does not *necessarily* mean it is
the best way to do it. (And even at the end of a coding cycle, it may
still be better to leave it out than to distort the syntax or limit
the future possibilities. I'm speaking generally here, not about this
change in particular.

On the subject of requested features / large changes...

is there a case for introducing array variables which can be read from
a file ?  Or is this starting to tread on octave's territory ?

Loading up a file as a 2-d array of strings (which can be converted to
numbers), and being able to plot from an array, may unblock a number
of problems.

dd
-- 
Dave Denholm              <dde...@es...>       http://www.esmertec.com

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Hans-Bernhard B. <br...@ph...> - 2004-07-18 21:46:09

On Sun, 18 Jul 2004, Ethan A Merritt wrote:

> On Sunday 18 July 2004 07:03 am, Hans-Bernhard Broeker wrote:
> > >
> > > 1) gprintf("format",mantissa,exponent)
> > >     Is that the form it should take?
> >
> > No.  It should be gprintf("format", number).  The crucial difference
> > is that C's sprintf() supports multiple % formats and uses up exactly
> > one argument per format specifier, whereas gprintf() only ever has one
> > argument, but may use more than one format specifier with it.
> 
> Let me see if I understand this...
> 
> The user would type, for example
> 	set label gprintf("format",var)
> and internally this would be converted into a call to the existing
> function of the form
>     gprintf( (char *)temp, sizeof(temp),
> 	(char *)format,
> 	(double)current_radix,
> 	(double)var);
> followed by copying temp into the appropriate place, in this case
> the label structure.
> 
> From the user's point of view (and the parser's), gprintf always has
> exactly two parameters:  (char *)format and (double)var.

Yes.  It may be useful / necessary to allow for the logarithm base 
(current_radix), too, but we can worry about that later.

-- 
Hans-Bernhard Broeker (br...@ph...)
Even if all the snow were burnt, ashes would remain.

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Hans-Bernhard B. <br...@ph...> - 2004-07-19 08:37:39

On Sun, 18 Jul 2004, Ethan A Merritt wrote:

> Anything you do _outside_ the quotes means that every single place
> that tests for a string constant has to be re-written to handle the
> possible substitution of other syntactic entities instead.

And everything you do inside them means you have to handle the
substitution at every single point of the code that actually uses a
string.  That's not necessarily a much smaller set of places.  It could
actually be a good deal larger.

It could almost certainly all be handled by extending m_quote_capture
to proceed evaluating a string-valued expression.

> Assume for the moment we use % for this purpose. Then the
> documentation would read:
> 	"<expression>"	evaluate immediately, with substitution
> 	'<expression>'		evaluate immediately, no substitution
> 	%<expression>%	evaluate later, with substitution at that time

I honestly don't see why we need the middle variant.  What is the
difference between evaluation and substitution that makes you want
control over each of them independently?

-- 
Hans-Bernhard Broeker (br...@ph...)
Even if all the snow were burnt, ashes would remain.

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan M. <merritt@u.washington.edu> - 2004-07-19 15:26:52

On Monday 19 July 2004 01:35 am, you wrote:
> On Sun, 18 Jul 2004, Ethan A Merritt wrote:
> > Anything you do _outside_ the quotes means that every single place
> > that tests for a string constant has to be re-written to handle the
> > possible substitution of other syntactic entities instead.
>
> And everything you do inside them means you have to handle the
> substitution at every single point of the code that actually uses a
> string.  That's not necessarily a much smaller set of places.  It could
> actually be a good deal larger.

Have you actually tried the patchset?
Adding the test+evaluation in one place only, the text-printing
routine write_multiline(), already catches a large majority of 
useful cases.  Adding it to the file-open code (not sure exactly
how many places that is) would catch most of the rest.
The minority of text-printing that does not go through write_multiline(),
with tic labels being the prime example, should be converted to do so.
That would at the same time address several queries about why
some of the current text options don't work for tic labels.

I may be missing some major class of possible uses for strings,
but it seems to me that about covers it right there.  What else did
you have in mind?

> It could almost certainly all be handled by extending m_quote_capture
> to proceed evaluating a string-valued expression.

That is the second stage, yes, but the first stage is teaching all the
parsing routines to accept something other than a quoted string in
all the places they currently expect it.

> > Assume for the moment we use % for this purpose. Then the
> > documentation would read:
> > 	"<expression>"	evaluate immediately, with substitution
> > 	'<expression>'		evaluate immediately, no substitution
> > 	%<expression>%	evaluate later, with substitution at that time
>
> I honestly don't see why we need the middle variant.  What is the
> difference between evaluation and substitution that makes you want
> control over each of them independently?

???
That's what we have *now*.   You want to remove it?

It's the third variant I'm trying to add - evaluation deferred until
plot time.

[aside:  About the only thing I use the existing single-quote mode for is to
allow inclusion of double-quotes in the string without having to escape
them with backslashes.  But I don't think that was the original intent.
What else is it useful for?]

-- 
Ethan A Merritt

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Hans-Bernhard B. <br...@ph...> - 2004-07-19 18:59:03

On Mon, 19 Jul 2004, Ethan Merritt wrote:

> On Monday 19 July 2004 01:35 am, you wrote:
> > On Sun, 18 Jul 2004, Ethan A Merritt wrote:
> > > Anything you do _outside_ the quotes means that every single place
> > > that tests for a string constant has to be re-written to handle the
> > > possible substitution of other syntactic entities instead.
> >
> > And everything you do inside them means you have to handle the
> > substitution at every single point of the code that actually uses a
> > string.  That's not necessarily a much smaller set of places.  It could
> > actually be a good deal larger.
> 
> Have you actually tried the patchset?

Not really.  I've relied on your descriptions of it for now.

> Adding the test+evaluation in one place only, the text-printing
> routine write_multiline(), already catches a large majority of 
> useful cases.  Adding it to the file-open code (not sure exactly
> how many places that is) would catch most of the rest.

Catching most isn't the issue.  Catching all of them is.

> I may be missing some major class of possible uses for strings,
> but it seems to me that about covers it right there.  What else did
> you have in mind?

Every usage of strings anywhere in gnuplot.  Datafile names, output file
names, 'print' strings, plot elements (axis labels, tick labels, labels,
title, key title, key entries), fit 'update' and 'via' files,
save/load/call file names, 'cd' names, loadpath names, terminal-wide or
label-wise font names. In short, all 98 calls of isstring(), and all 19
calls of m_quote_capture and all 62 of quote_str()

> > It could almost certainly all be handled by extending m_quote_capture
> > to proceed evaluating a string-valued expression.
> 
> That is the second stage, yes, but the first stage is teaching all the
> parsing routines to accept something other than a quoted string in
> all the places they currently expect it.

Not really.  The trick I have in mind is to teach m_quote_capture()  and
friends themselves to accept a quoted string followed by whatever else
there is to the string-valued expression it started off.  Just as we
currently parse the function to be plotted by having the parser eat up
as much of the command line as fits the syntax of an expression, a 
string would continue until the next piece of syntax doesn't match the 
syntax of a string expression any longer.  

> > > Assume for the moment we use % for this purpose. Then the
> > > documentation would read:
> > > 	"<expression>"	evaluate immediately, with substitution
> > > 	'<expression>'		evaluate immediately, no substitution
> > > 	%<expression>%	evaluate later, with substitution at that time
> >
> > I honestly don't see why we need the middle variant.  What is the
> > difference between evaluation and substitution that makes you want
> > control over each of them independently?
> 
> ???
> That's what we have *now*.   You want to remove it?

No.  But you had me confused over what your terminology means there.

Now, looking at all this a bit closer, it seems like you're understanding
"substitution" to mean the stuff we already have for the first two cases
(\n, \0123, `backquotes`, ...), and "evaluation" for the new string stuff.  

Well, so far in gnuplot, late evaluation has been limited to the
definition of user-defined functions --- all other expressions are
evaluated to fixed results before the command they appear in is finished
executing.  That method has served us well so far, and before we stray
from that path, we should have a solid reason for doing so.

I honestly don't see the need for late evaluation from the user interface
side of things. I don't see a compelling reason why

	set title {some expression involving variable i}
	i = 5
	plot something
	i = 6
	plot something else

has to produce two different title strings, without even offering the
option of getting the same title on both of them.  Even closer to the
point, why should

	x = 6
	set label 'sprintf("%g", x)' at x, f(x)
	x = 1

show the label at position (6, f(6)), but print the string as '1.0'?
From where I sit, that makes no sense whatsoever.

So here's a new summary of the matter: late evaluation should be a 
subject kept separate from that of string variables.  If we add
late evaluation, it should be added it for both strings and numbers.

> [aside:  About the only thing I use the existing single-quote mode for
> is to allow inclusion of double-quotes in the string without having to
> escape them with backslashes.  But I don't think that was the original
> intent. What else is it useful for?]

E.g. for input of strings in terminals like LaTeX or PostScript enhanced,
where you really don't want to have to type every \ character twice, and
for DOS/Windows filenames, where \n processing would produce rather
surprising results.  Although the latter can be circumvented by using / to
separate directories, which works just as well, but most Windowsers don't
know that.

-- 
Hans-Bernhard Broeker (br...@ph...)
Even if all the snow were burnt, ashes would remain.

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan M. <merritt@u.washington.edu> - 2004-07-19 21:18:32

> > I may be missing some major class of possible uses for strings,
> > but it seems to me that about covers it right there.  What else did
> > you have in mind?
> 
> Every usage of strings anywhere in gnuplot.  Datafile names, output file
> names, 'print' strings, plot elements (axis labels, tick labels, labels,
> title, key title, key entries), 

Please try it out.  It catches those already except for the tic labels and
key entries, which ought to go through write_multiline() but don't currently.
I'll add that to the patch, or just add it to cvs separately.

> In short, all 98 calls of isstring(), and all 19
> calls of m_quote_capture and all 62 of quote_str()

No.  You are misunderstanding, or I am totally failing to describe
things properly.  The great thing about this approach, which you
suggested yourself, is that none of those need to be changed.

> fit 'update' and 'via' files, save/load/call file names, 'cd' names,
> loadpath names, terminal-wide or label-wise font names.

I agree that filenames are not yet handled in the patchsets.
I was not trying for 100% complete coverage in the first go-round.

I will describe the patchsets from a new angle; let's see if 
I can make it more understandable this time:

Patchset 1:
-----------------

This patchset does 3 things.

(1) It adds STRING as a legal "value"  (gp_types.h data structure)
and overloads concatenation onto the + operation applied to STRINGS.

(2) It adds a single string-valued function, sprintf(...).
This works automagically anywhere that the gnuplot parser accepts
a function name. So simple assigment statements, like
	LABEL = sprintf("whatever",var1,var2,var3)
just work, with no new code needed.

(3) It adds a check for this string-valued function in exactly two
places where a function name was not previously accepted by
the parser. These are
	set label <something>
and
	set [xyz...]label <something>

The latter routine handles titles as well, so the coverage is
more complete than you might think at first blush.


Patchset 2
---------------

Adds a string-function evaluator. I don't know what the right term
is for this, but basically it just triggers the existing function evaluation
code. It is directly modelled on do_line(), except that it requires
the top level function being evaluated to return a string.

At a single call site in write_multiline(), it checks for a magic
leading sequence of characters in every string that is printed.
If the sequence is recognized, it filters the string through the
string-function evaluator.

This is a beautifully simple change.  At one shot it implements
variable substitution into all strings printed via write_multiline().
It also has the effect that the *current* value of variables is
substituted in, which is a major bonus in its own right.   This
is what nearly everyone expects gnuplot to do now, but it
doesn't.


Possible next steps (hypothetical patchset 3)
-------------------------------------------------------

Indeed there are other places that the current syntax does not
allow a function name including, as you point out, the specification
of file names.  This could be addressed in several ways.

(1) We can expand the legal syntax at these places one by
one as they are needed or requested.  Here is the code that
went in for labels; other places would be more or less the same:

+#ifdef GP_STRING_VARS
+    /* Allow creation of label text using sprintf() */
+    if (equals(c_token,"sprintf")) {
+       struct value a = {STRING,{NULL}};
+       (void) const_express(&a);
+       this_label->text = a.v.string_val;
+    } else
+#endif
+
+    /* get text from string */
     if (!END_OF_COMMAND && isstring(c_token)) {

(2) We could replace fopen() with an newly-defined gp_fopen() and
add the string evaluation code only in the new routine.

(3) We could instead use the approach of my older "userstrings" patch,
which essentially implements command-line macro definitions. This would
not require changing the  individual parsing fragments, but would open
up a new front in the argument^h^h^h^h err.. discussion.
Since the macro-substitution is done on the input line as a text string,
it is independent of the parsing routines that will later come into play.


> I honestly don't see the need for late evaluation from the user interface
> side of things. I don't see a compelling reason why
> 
> 	set title {some expression involving variable i}
> 	i = 5
> 	plot something
> 	i = 6
> 	plot something else
> 
> has to produce two different title strings, without even offering the
> option of getting the same title on both of them.  

Straw man argument.  There are plenty of options for having the
title come out the same.  What we are talking about is how to get
it to come out differently.

If you want a more obvious, frequently requested, example:

set title 'sprintf("Fit cycle %d finished at `date`, A = %7.4f B = %7.4f", \
			ncyc,A,B)'
n = n+1; fit f(x) 'data' via A,B; plot f(x)
n = n+1; fit f(x) 'data' via A,B; plot f(x)
<wash, rinse, repeat as often as you like>


> 	x = 6
> 	set label 'sprintf("%g", x)' at x, f(x)
> 	x = 1
> 
> show the label at position (6, f(6)), but print the string as '1.0'?
> From where I sit, that makes no sense whatsoever.

Well, so don't do that.  There is no requirement that you have to
select plot-time evaluation of strings if it doesn't make sense.

> So here's a new summary of the matter: late evaluation should be a 
> subject kept separate from that of string variables.

That's why I split things into two patchsets. 
But I hardly think they are independent. 

> If we add late evaluation, it should be added it for both strings and numbers.

Please. Just try the patchsets.   
It does work for both strings and numbers, at least within the realm
it is trying to address.  I'm sure you can come up with things it 
doesn't do at all, but that by itself is not a good argument against
using it for the things it *does* do.




And on the tangential matter of quote styles:

> > [aside:  About the only thing I use the existing single-quote mode for
> > is to allow inclusion of double-quotes in the string without having to
> > escape them with backslashes.  But I don't think that was the original
> > intent. What else is it useful for?]
> 
> E.g. for input of strings in terminals like LaTeX or PostScript enhanced,
> where you really don't want to have to type every \ character twice, and
> for DOS/Windows filenames, where \n processing would produce rather
> surprising results.

No comment on the DOS/Windows issue, but from my unix-centric
perspective this is not at all what the single-quote convention is 
expected to do.   Following the conventions of both sh- and csh-
derived shells,  enclosing a character string inside single quotes
should mean that absolutely nothing at all is done to it.  No fiddling
with back-slashes, no substitution of variables, no execution of
shell escapes.  IMHO gnuplot should do the same thing - save the
single-quoted string as is with no fiddling.   Right now gnuplot
takes the more complicated and error-prone route of trying to
figure out what would have been required to type the string as a
double-quoted string instead, and saving that.  For what gain?
It should just save the string as entered, with a flag that it was 
given in single quotes.

-- 
Ethan A Merritt       merritt@u.washington.edu
Biomolecular Structure Center
Mailstop 357742
University of Washington, Seattle, WA 98195

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan A M. <merritt@u.washington.edu> - 2004-07-20 03:27:41

On Monday 19 July 2004 02:18 pm, Ethan Merritt wrote:
>
> Please try it out.  It catches those already except for the tic labels and
> key entries, which ought to go through write_multiline() but don't
> currently. I'll add that to the patch, or just add it to cvs separately.

Heh. I was mis-remembering.  Most of the tic labels do already use
write_multiline(). It's only the colorbox tics that bypass it for some
reason. So that and the key entries are the missing pieces I can find.

-- 
Ethan A Merritt
Department of Biochemistry & Biomolecular Structure Center
University of Washington, Seattle

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Hans-Bernhard B. <br...@ph...> - 2004-07-20 09:01:44

On Mon, 19 Jul 2004, Ethan Merritt wrote:

> > In short, all 98 calls of isstring(), and all 19
> > calls of m_quote_capture and all 62 of quote_str()
> 
> No.  You are misunderstanding, or I am totally failing to describe
> things properly.  

A combination of both, it appears.  I had lost overview somewhere along
the line of this, and you didn't see it happen.

> Patchset 1:
> -----------------
> 
> This patchset does 3 things.
[...]

> (3) It adds a check for this string-valued function in exactly two
> places where a function name was not previously accepted by
> the parser. 

I think what I'm getting at that this check had better go into isstring()
instead --- the function that every command parser fragment is supposed to
be using to check whether the upcoming command line token is a string or
not.

> Patchset 2
> ---------------
> 
> Adds a string-function evaluator. I don't know what the right term is
> for this, but basically it just triggers the existing function
> evaluation code. It is directly modelled on do_line(), except that it
> requires the top level function being evaluated to return a string.
> 
> At a single call site in write_multiline(), it checks for a magic
> leading sequence of characters in every string that is printed. If the
> sequence is recognized, it filters the string through the
> string-function evaluator.

Ah, thanks, now this is a whole lot clearer than before.  So this patch is
what I've been referring to as late evaluation, i.e. the technique of
storing an expression, rather than its result, to be evaluated at the
latest possible time.  I'm still not confinced that 'sprintf( is a
suitable choice of trigger string, but now at least I understand what
you're doing.

I.e. we're actually in perfect agreement as to what the necessary
features are --- we just used so completely different terms that we
confused each other completely.  Thanks for staying with me long enough
that we could finally sort this out.

> Possible next steps (hypothetical patchset 3)
> -------------------------------------------------------
> 
> Indeed there are other places that the current syntax does not
> allow a function name including, as you point out, the specification
> of file names.  This could be addressed in several ways.
> 
> (1) We can expand the legal syntax at these places one by
> one as they are needed or requested.

That might be an endless journey.  Better do it in a more brutal fashion:
either by documenting and recommending the ''+ trick to signal a string
expression coming up, or by changing isstring() and friends, as I
suggested earlier.

> (2) We could replace fopen() with an newly-defined gp_fopen() and
> add the string evaluation code only in the new routine.

> No comment on the DOS/Windows issue, but from my unix-centric
> perspective this is not at all what the single-quote convention is 
> expected to do.   Following the conventions of both sh- and csh-
> derived shells,  enclosing a character string inside single quotes
> should mean that absolutely nothing at all is done to it. 

And so far, before your second patchset, nothing is --- that patchset
actually breaks that rule.  Single quoted string are used as-is,
unmodified.  Except that when they're written out to save files, which may
not even stay on the same platform as the gnuplot executable they were
built on, it's unreliable to store them as single-quoted ones.  That's
where conv_text() comes in and generates an ASCII-only, portable
representation for us. The actual problem here was simply a bug in
conv_text().

This save file issue is one which we can get no guidance from Unix shells
from --- they don't save states of internal settings to files meant to be
read back in.

-- 
Hans-Bernhard Broeker (br...@ph...)
Even if all the snow were burnt, ashes would remain.

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan M. <merritt@u.washington.edu> - 2004-07-20 17:47:01

On Tuesday 20 July 2004 01:53 am, Hans-Bernhard Broeker wrote:
> > (3) It adds a check for this string-valued function in exactly two
> > places where a function name was not previously accepted by
> > the parser. 
> 
> I think what I'm getting at that this check had better go into isstring()
> instead --- the function that every command parser fragment is supposed to
> be using to check whether the upcoming command line token is a string or
> not.

That turns out not to help.
Consider a typical call site, one of the 98 you counted:

set.c (set_fontpath) line 1801:

        if (isstring(c_token)) {
            int len;
            char *ss = gp_alloc(token_len(c_token), "tmp storage");
            len = (collect? strlen(collect) : 0);
            quote_str(ss,c_token,token_len(c_token));
            collect = gp_realloc(collect, len+1+strlen(ss)+1, "tmp fontpath");
            if (len != 0) {
                strcpy(collect+len+1,ss);
                *(collect+len) = PATHSEP;
            }
            else
                strcpy(collect,ss);
            free(ss);
            ++c_token;
	}

I am not clever enough to re-write isstring() and quote_str() such
that the code at this call site works when the length of the string
changes mid-stream.  So trying to be clever in isstring() and
quote_str() paradoxically makes *more* work, since every call site
would have to be inspected and re-written for compatibility.

If the program were being re-written from scratch, then yes.
But in the interest of sanity and not introducing 98 possible
new bug sites, I would rather go for a solution that leaves the
input code intact and focuses instead on identifying a small
number of places where the string is actually used for
something.

case 1: the string is printed. I dealt with this by adding code to
write_multiline().

case 2: the string is used as a file name.   I haven't handled this
yet, but I propose to create a new routine gp_fopen() which 
contains the same check for magic characters at the start of a
string that I inserted into write_multiline().  Yes, this requires
changing call sites from  fd = fopen("...") to fd = gp_fopen("...")
but this would be a piece of cake compared to rethinking and
rewriting 98 sites like the one above .

case 3: Are there any more end-uses for a string?

> Ah, thanks, now this is a whole lot clearer than before.  So this patch is
> what I've been referring to as late evaluation, i.e. the technique of
> storing an expression, rather than its result, to be evaluated at the
> latest possible time.  I'm still not confinced that 'sprintf( is a
> suitable choice of trigger string, but now at least I understand what
> you're doing.

I am not trying to convince anyone that "sprintf" is the best
choice.  The only virtue it has is that we can refer people to
"man sprintf" for help with the format specifiers. I would rather
have either an alternate quote character (something different
from ' or ") or a single magic character following a normal
quote character.

	set title %<expression>%
or
	set title "!<expression>"

Unfortunately the '!' character is a unary operator already,
as are '+' and '^' and many of the other obvious candidates.
So sticking it in front of a general expression is at best a bit
confusing, and at worst ambiguous to parse. I hesitate to
suggest '$', but at least it is not a unary operator.

The other alternative is a pseudo-function such as
	set title %(<expression>)
but that leads us right back to the problem of those 98
call sites that are expecting a quote character.

Please note that this doesn't get away from having to type
"sprintf" if that is in fact the function you want to evaluate.
So even with a single-%-character quoting convention, the
full form would be
	set title %sprintf("<format>",var1,var2)%

but if we allow other string-valued functions, then any one
of them might replace "sprintf" in this example.
Hypothetically it would look like
	set title %myfunc(par1,par2)%

-- 
Ethan A Merritt       merritt@u.washington.edu
Biomolecular Structure Center
Mailstop 357742
University of Washington, Seattle, WA 98195

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Volker D. <v.d...@we...> - 2004-07-21 07:34:04

Ethan Merritt wrote:
> Consider a typical call site, one of the 98 you counted:
> 
> set.c (set_fontpath) line 1801:
> 
> if (isstring(c_token)) {
> int len;
> char *ss = gp_alloc(token_len(c_token), "tmp storage");
> len = (collect? strlen(collect) : 0);
> quote_str(ss,c_token,token_len(c_token));
> collect = gp_realloc(collect, len+1+strlen(ss)+1, "tmp fontpath");
> 
> I am not clever enough to re-write isstring() and quote_str() such
> that the code at this call site works when the length of the string
> changes mid-stream. So trying to be clever in isstring() and
> quote_str() paradoxically makes *more* work, since every call site
> would have to be inspected and re-written for compatibility.
> 
Yes, but this I think this is easy:  I took a look at my old `value
substitution` patch (the python like syntax of "%(foo).2f"):
There are two functions: quote_str and m_quote_capture.
Using quote_str you will have to do memory allocation yourself,
whereas m_quote_capture will do allocation for you if I remember
correctly.  My old patch replaced (hopfully) all the invocations
of quote_str with m_quote_capture and did substitution there. 

> If the program were being re-written from scratch, then yes.
> But in the interest of sanity and not introducing 98 possible
> new bug sites, I would rather go for a solution that leaves the
> input code intact ...
>
I think 98 sites which do memory allocation themself is more
dangerous than replacing 98 call sites with one tested and
correct call to m_quote_capture.

Regarding the discussion about late evaluation:  I would like
to _object_ to late evaluation.  Late evaluation will intruduce
much more problems in user space than it will solve.  If I
construct a string (regardless if done by the '' + trick or sprintf
or anyting else) I expect it to be the way I constructed it;
at least that's the behavior of all programming languages I am
familiar with.  I do not see any problem which could be solved
easier with late evaluation, at least not from user side:  It is
very easy to reconstruct the string if a variable change should
be reflected in the string.

IMHO the substitution should be done right during parsing,
no late evaluation and m_quote_capture is a good place for
it.

Volker
 
--
Volker Dobler
_______________________________________________________
WEB.DE Video-Mail - Sagen Sie mehr mit bewegten Bildern
Informationen unter: http://freemail.web.de/?mc=021199

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: <mi...@ph...> - 2004-07-21 09:29:31

> Regarding the discussion about late evaluation:  I would like
> to _object_ to late evaluation.  Late evaluation will intruduce
> much more problems in user space than it will solve.

The late evaluation can be just a flag to "set label". That's where we
expect this feature to happen most frequently.
set label 1 sprintf("fitted values a=%g b=%g") at 1,1 lateeval

Or, there could be "sprintfLate", but to be authorized only for strings
for "set label", "set title", "(s)plot"... i.e., where the string is used
immediately.
---
PM

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan M. <merritt@u.washington.edu> - 2004-07-21 15:35:37

On Wednesday 21 July 2004 02:29 am, mi...@ph... wrote:
> > Regarding the discussion about late evaluation:  I would like
> > to _object_ to late evaluation.  Late evaluation will intruduce
> > much more problems in user space than it will solve.

Like what, for example?

So far I don't know of any bad side to this capability.
Sure, most code won't use it - but that is the case for many
features.  And the beauty of this way of doing things is that
it is actually simpler to handle the general case than it is
to add a lot of special cases.

> The late evaluation can be just a flag to "set label". That's where we
> expect this feature to happen most frequently.
> set label 1 sprintf("fitted values a=%g b=%g") at 1,1 lateeval

Yes, that would be possible.  But why only labels?
Why not plot titles, key titles, fit logfile names, and all the
rest?

-- 
Ethan A Merritt

Re: [ gnuplot-Patches-992149 ] String variables revisited

From: Ethan M. <merritt@u.washington.edu> - 2004-07-21 20:58:37

[Another long post. Please read it in the hope that all will
become clear.  Or at least become clearer than it is at
present.]

Let me start with a plea to drop the term "late evaluation",
which I think is side-tracking the current discussion.
Whatever it may mean to different people, I don't think it is
a useful way of thinking about my patchsets.

The evaluation of expressions containing string functions is no
different from the evaluation of all other gnuplot functions.
The evaluation is never "late"; it happens when it happens.

Consider the existing case with no strings:

# user function is defined
   f(x) = sin(x)    
# user function is evaluated and result goes in var1
   var1 = f(3.0)
# user function is stored with plot for use in plotting,
# evaluation happens at plot time
   plot 'data' using 1:(f($2))

In the first two cases the function is evaluated at the
time the command line is parsed.
In the third case the function is evaluated once per data
point, but not until the file is read in during plotting.
This is obvious, I hope.

So....
What I did was expand the variety of functions to include
string-valued functions.  But otherwise the evaluation works
the same way

# (1) User function is defined
   f(x) = sprintf("%7.5f",x)
# (2) Pre-defined function is evaluated and result stored
   var1 = sprintf("%2.1f",x)
# (3) Same thing using user function instead
   var2 = f(x)
# (4) User function is stored with plot for use in plotting
   plot 'data' using 1:2:(f($2))

(1) Needs a bit more code to make it work, but is straightforward
(2) This is, I thought, what everyone was asking for but now we
seem to be arguing about
(3) Combination of (1) and (2); evaluation is done in two steps but
still at the time the command line is parsed.
(4) Evaluation of f(x), which is really sprintf("fmt",x) in this case,
does not happen until the data is read in.   In fact the command
I show here as an example is currently broken because of issues
with assumed column contents (see other thread).  But I was 
planning to fix this for the next version of the patchsets.

The point is that in each case, string or no string, the evaluation 
happens at the time it is needed. For 'set <foo>' commands 
this is usually at the time the command line is parsed, but for
'plot/splot/fit' commands it happens elsewhere.  

That inevitably leads to the question "how does gnuplot know that
evaluation is needed?", and that's where the fun (or argument) started.
In all the examples I gave above it is conceptually all the same
whether the variables and functions take on int- or real- values or
whether they take on string values.

But there are a few operations where the two possibilites diverge.
For example, real-valued variables are used (and changed) during
fitting.  I can't imagine that being a meaningful operation for a 
string variable.

Conversely, one thing we do with strings is to print them,
something that doesn't happen to numbers or at least only via
an intermediate string.  Printing a string constant is easy; we do
that already.  But how do you store, or mark, a string-valued
function so the gnuplot knows it must be evaluated and printed?
That is the heart of this discussion.   

I picked on up a suggestion from Hans to mark a string-valued
function by inserting some "magic characters" inside the quotes.
I thought that was a clever idea because it is transparent to the
existing code.  I am bemused by the controversy this seems to
have generated.  

There are two issues:  
how it looks on the command line, and how it is stored internally.

Look, I really don't care how we do it.  This seemed to me a clever
representation.  In addition to being compatible with the 
current code, it means that the internal representation is the
same as the command-line representation (quote + magic chars).
But these are mere conveniences, and are totally separate from
the core functionality.  Feel free to propose other represenations,
and explain why they are better.

And now, finally to Volker's specific questions....

On Wednesday 21 July 2004 12:33 am, Volker Dobler wrote:
> I think 98 sites which do memory allocation themself is more
> dangerous than replacing 98 call sites with one tested and
> correct call to m_quote_capture.

I have no comment at this time.
I will revisit your earlier patch and see what you did there.
Are you willing to help fix things if this mass substitution 
breaks them?

> Regarding the discussion about late evaluation:  I would like
> to _object_ to late evaluation.  Late evaluation will intruduce
> much more problems in user space than it will solve.

Petr said this too.  I am totally not understanding this 
concern.  But then again, I don't even know what you think
"late evaluation" means. 
Could you give an example?

> If I construct a string (regardless if done by the '' + trick or sprintf
> or anyting else) I expect it to be the way I constructed it;
> at least that's the behavior of all programming languages I am
> familiar with.

Yes.  That's fine.  That's what the sprintf() function does -
it constructs a string permanently and stores it in a variable.
That variable always contains the string you constructed.
It does not change.  Ever.

The point which is confusing is that string may in fact contain a
command.  But even this is not new - the hot-key bindings are
strings that contain commands. The command they contain is
executed "when needed", which in the case of key bindings
is in response to an event generated externally.
If you type 'bind' it will print a list of these stored commands.
It's the same with the string variables. 
If you type 'show VAR' it will show you what command is stored
there.  But there are other places in the code that can actually
execute that command when needed.

Users are under no obligation to store commands in string
variables any more than they are required to program new hot-key
bindings.  It's just something you *can* do if it proves useful.

-- 
Ethan A Merritt

1 2 > >> (Page 1 of 2)