From: Juhász Péter <peter.juhasz83@gm...>  20101228 19:17:55

Dear gnuplot developers, a recent thread[1] in the newsgroup made me play with the rand() function, and I've found some issues with it: 1) rand() silently accepts string arguments: gnuplot> print rand("foo") 0.222457440974512 While this is not particularly bad in itself, but it's not consistent with the behavior of other numerical functions (which reject string arguments), and it's undocumented. Its effect is the same as that of rand(0)  which have misled the user in the thread[1], because he thought that rand("time") sets the random seed to the current time. 2) It is possible to lock the PRNG into a state where the returned numbers are not random at all: gnuplot> print rand(0.5) 0.999999999450207 gnuplot> print rand(0) 0.999999999450207 gnuplot> print rand(0) 0.999999999450207 gnuplot> print rand(0) 0.999999999450207 gnuplot> print rand(0) 0.999999999450207 I'll refrain from linking to the relevant xkcd or Dilbert comics. 3) Consider the following plot: set xrange [0:2**22] set samples 1000 plot '+' u 1:(rand($1)) w l The rand() function, when called with a nonzero argument, sets the seeds based on the argument and returns the first pseudorandom number from the sequence associated those seeds. As the plot shows, there is a rather obvious dependence between rand's argument and the returned number, in fact the dependence is linear if only the lower 21orso bits of the argument are considered. This is a problem if we want to use the standard trick of initializing the PRNG with the current time (as the user wanted to in [1]): gnuplot> print rand(real(system("date +%s"))) 0.596925441003784 gnuplot> print rand(real(system("date +%s"))) 0.596924809567054 gnuplot> print rand(real(system("date +%s"))) 0.596924178130323 gnuplot> print rand(real(system("date +%s"))) 0.596923546693592 date +%s returns the time in the Unix time_t format (32 bit integer), however, the upper bits rarely change in that format. 1) and 2) may be bugs in the implementation that are easy to fix, but 3) is a deeper problem. Of course, this kind of behavior can be expected from a linear congruence generator  but then maybe it's time to consider changing to a different algorithm. Péter Juhász ps. Merry Christmas and a Happy New Year to you all!  [1] http://groups.google.com/group/comp.graphics.apps.gnuplot/browse_thread/thread/7e82fdbca70397c7# 
From: Ethan Merritt <merritt@u.washington.edu>  20101228 22:27:13

On Tuesday, December 28, 2010, Juhász Péter wrote: > Dear gnuplot developers, > > a recent thread[1] in the newsgroup made me play with the rand() > function, and I've found some issues with it: > > 1) rand() silently accepts string arguments: > gnuplot> print rand("foo") > 0.222457440974512 > > While this is not particularly bad in itself, but it's not consistent > with the behavior of other numerical functions (which reject string > arguments), and it's undocumented. Its effect is the same as that of > rand(0)  which have misled the user in the thread[1], because he > thought that rand("time") sets the random seed to the current time. I am not terribly concerned that passing a string to a numeric function produces a garbage result, but it would be easy enough to do the same thing in specfun.c that is already done in internal.c and standard.c internal.c:#define pop(x) pop_or_convert_from_string(x) standard.c:#define pop(x) pop_or_convert_from_string(x) > 2) It is possible to lock the PRNG into a state where the returned > numbers are not random at all: > gnuplot> print rand(0.5) > 0.999999999450207 > gnuplot> print rand(0) > 0.999999999450207 > gnuplot> print rand(0) > 0.999999999450207 > gnuplot> print rand(0) > 0.999999999450207 > gnuplot> print rand(0) > 0.999999999450207 Here I agree that the documentation should state that seeds are required to be nonzero integers, although it is then tricky to explain that the way to set both seed values is rand( i + j*{0,1} ) It's fair to consider acceptance of a seed value (0 < seed < 1) as a bug, since it will immediately be converted to an integer value 0 which is not a valid seed. This might be an adequate fix: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  gnuplot/src/specfun.c 20101021 22:28:24.000000000 0700 +++ gnuplotcvs/src/specfun.c 20101228 13:23:20.000000000 0800 @@ 1098,7 +1098,7 @@ ranf(struct value *init) /* Construct new seed values from input parameter */ /* FIXME: Ideally we should allow all 64 bits of seed to be set */  if (real(init) > 0.0) { + if (real(init) > 1.0) { if (real(init) >= (double)(017777777777UL)) int_error(NO_CARET,"Illegal seed value"); if (imag(init) >= (double)(017777777777UL)) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Note that the FIXME comment is out of date. You can indeed set 64 bits of seed by using rand(i + j*{0,1}) as noted above. > 3) Consider the following plot: > > set xrange [0:2**22] > set samples 1000 > plot '+' u 1:(rand($1)) w l > > The rand() function, when called with a nonzero argument, sets the seeds > based on the argument and returns the first pseudorandom number from > the sequence associated those seeds. As the plot shows, there is a > rather obvious dependence between rand's argument and the returned > number, in fact the dependence is linear if only the lower 21orso bits > of the argument are considered. Now I'm out of my depth. Pseudorandom number generation is fraught with pitfalls. Is it necessarily a bad thing that the first number is related to the seed? The primary requirement is that successive numbers within a generated sequence be sufficiently independent, or so I understand it. A requirement that related seeds produce unrelated starting values for the sequence seems like a quite different thing. I can imagine that such a property may be desirable for cryptography, but it seems unrelated to uses in gnuplot. Consider the following script: # Tabulate first 1000 numbers in sequences generated by successive seeds set samples 1000 set table '1.dat' print rand(101) plot '+' using 0:(rand(0)) set table '2.dat' print rand(102) plot '+' using 0:(rand(0)) # Compare the paired elements within the two sequences system("join 1.dat 2.dat > 1_2.dat") plot "1_2.dat" using 2:4 No correlation is evident in the two sequences, even though they are seeded with successive integers and thus the numerical difference between their initial elements is small. > > This is a problem if we want to use the standard trick of initializing > the PRNG with the current time (as the user wanted to in [1]): > gnuplot> print rand(real(system("date +%s"))) > 0.596925441003784 > gnuplot> print rand(real(system("date +%s"))) > 0.596924809567054 > gnuplot> print rand(real(system("date +%s"))) > 0.596924178130323 > gnuplot> print rand(real(system("date +%s"))) > 0.596923546693592 I don't see why this is a problem. Explain, please? If the idea is to generate different streams of random numbers, I think that it succeeds at that. True, the initial values of sequences generated in close succession are similar, but again I have to ask why this is a problem? If it bothers you, can't you just start with the second value of each sequence? > date +%s returns the time in the Unix time_t format (32 bit integer), > however, the upper bits rarely change in that format. So why not use system("date +%N")? > 1) and 2) may be bugs in the implementation that are easy to fix, but 3) > is a deeper problem. Of course, this kind of behavior can be expected > from a linear congruence generator  but then maybe it's time to > consider changing to a different algorithm. > > Péter Juhász > ps. Merry Christmas and a Happy New Year to you all! 
From: Daniel J Sebald <daniel.sebald@ie...>  20101228 23:56:09

Ethan Merritt wrote: > On Tuesday, December 28, 2010, Juhász Péter wrote: [snip] > > 2) It is possible to lock the PRNG into a state where the returned > >>numbers are not random at all: >>gnuplot> print rand(0.5) >>0.999999999450207 >>gnuplot> print rand(0) >>0.999999999450207 >>gnuplot> print rand(0) >>0.999999999450207 >>gnuplot> print rand(0) >>0.999999999450207 >>gnuplot> print rand(0) >>0.999999999450207 > > > Here I agree that the documentation should state that seeds are > required to be nonzero integers, although it is then tricky to > explain that the way to set both seed values is > rand( i + j*{0,1} ) > > It's fair to consider acceptance of a seed value (0 < seed < 1) as a > bug, since it will immediately be converted to an integer value 0 > which is not a valid seed. This might be an adequate fix: > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% >  gnuplot/src/specfun.c 20101021 22:28:24.000000000 0700 > +++ gnuplotcvs/src/specfun.c 20101228 13:23:20.000000000 0800 > @@ 1098,7 +1098,7 @@ ranf(struct value *init) > > /* Construct new seed values from input parameter */ > /* FIXME: Ideally we should allow all 64 bits of seed to be set */ >  if (real(init) > 0.0) { > + if (real(init) > 1.0) { > if (real(init) >= (double)(017777777777UL)) > int_error(NO_CARET,"Illegal seed value"); > if (imag(init) >= (double)(017777777777UL)) > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > Note that the FIXME comment is out of date. You can indeed set 64 bits > of seed by using rand(i + j*{0,1}) as noted above. The "value" type is double floats for real and imaginary, then? Well, that should hold 32 bits within a 52 bit fractional. (seed1 and seed2 are longs.) A couple odd things however. Why is the limiting value 017777777777UL? Is that some bad way of obtaining the maximum 32 value of 2^321, i.e., 4294967295? If not, then it should be if (real(init) >= (double)(04294967295UL)) int_error(NO_CARET,"Illegal seed value"); if (imag(init) >= (double)(04294967295UL)) int_error(NO_CARET,"Illegal seed value"); And what is with this? seed1 = (int)real(init); seed2 = (int)imag(init); That's a potential bug for compilers in which the default integer is 16 bits. First casting to int and then casting to long could lose something. Regarding the out of date comment, did it mean that seed1 and seed2 should be 64bit, i.e., long long? Dan 
From: Ethan Merritt <merritt@u.washington.edu>  20101229 00:28:11

On Tuesday, December 28, 2010, Daniel J Sebald wrote: > Ethan Merritt wrote: > > It's fair to consider acceptance of a seed value (0 < seed < 1) as a > > bug, since it will immediately be converted to an integer value 0 > > which is not a valid seed. This might be an adequate fix: > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > >  gnuplot/src/specfun.c 20101021 22:28:24.000000000 0700 > > +++ gnuplotcvs/src/specfun.c 20101228 13:23:20.000000000 0800 > > @@ 1098,7 +1098,7 @@ ranf(struct value *init) > > > > /* Construct new seed values from input parameter */ > > /* FIXME: Ideally we should allow all 64 bits of seed to be set */ > >  if (real(init) > 0.0) { > > + if (real(init) > 1.0) { > > if (real(init) >= (double)(017777777777UL)) > > int_error(NO_CARET,"Illegal seed value"); > > if (imag(init) >= (double)(017777777777UL)) > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > > Note that the FIXME comment is out of date. You can indeed set 64 bits > > of seed by using rand(i + j*{0,1}) as noted above. > > The "value" type is double floats for real and imaginary, then? Well, that should hold 32 bits within a 52 bit fractional. (seed1 and seed2 are longs.) Yes. The number of mantissa bits kept by a (double) is greater than 32, so it's OK to hold a 32bit integer value in a double. > A couple odd things however. Why is the limiting value 017777777777UL? > Is that some bad way of obtaining the maximum 32 value of 2^321, i.e., 4294967295? I think the intent is to make sure that the integer value didn't overflow before being converted to (double). Off the top of my head, I can't explain why the test isn't against 037777777777 instead. > If not, then it should be > > if (real(init) >= (double)(04294967295UL)) > int_error(NO_CARET,"Illegal seed value"); > if (imag(init) >= (double)(04294967295UL)) > int_error(NO_CARET,"Illegal seed value"); Why? All bits set is easy to remember; the numerical value 4294967295  not so easy. > And what is with this? > > seed1 = (int)real(init); > seed2 = (int)imag(init); > > That's a potential bug for compilers in which the default integer is 16 bits. First casting to int and then casting to long could lose something. We no longer support 16bit. I don't know if that particular bit of code was ever a problem back in the days when we did. But you are correct that the explicit cast is not needed. > Regarding the out of date comment, did it mean that seed1 and seed2 should be 64bit, > i.e., long long? Upon reflection, I think the comment is pointing out that since we just disallowed any values with the high bit set you can only set the low 31 bits of each half. I.e. only 62 of the 64 bits can be set. 
From: Daniel J Sebald <daniel.sebald@ie...>  20101229 02:39:13

Ethan Merritt wrote: > On Tuesday, December 28, 2010, Daniel J Sebald wrote: > >> Ethan Merritt wrote: [snip] >> A couple odd things however. Why is the limiting value >> 017777777777UL? Is that some bad way of obtaining the maximum 32 >> value of 2^321, i.e., 4294967295? > > I think the intent is to make sure that the integer value didn't > overflow before being converted to (double). Off the top of my head, > I can't explain why the test isn't against 037777777777 instead. > > > >> If not, then it should be >> >> if (real(init) >= (double)(04294967295UL)) >> int_error(NO_CARET,"Illegal seed value"); if (imag(init) >= >> (double)(04294967295UL)) int_error(NO_CARET,"Illegal seed value"); > > > Why? All bits set is easy to remember; the numerical value > 4294967295  not so easy. Oh yeah, octal. I'm used to thinking either decimal or hex. Right, 10 sevens is 30 bits equal 1, so 2 more bits would be a value of 3. 037777777777UL makes more sense. [snip] >> Regarding the out of date comment, did it mean that seed1 and seed2 >> should be 64bit, i.e., long long? > > > Upon reflection, I think the comment is pointing out that since we > just disallowed any values with the high bit set you can only set the > low 31 bits of each half. I.e. only 62 of the 64 bits can be set. Shouldn't be a problem if using 037777777777UL. As you pointed out, placing one seed in the real component and one seed in the imaginary component is a bit peculiar. Can the rand() function simply have two optional inputs? Dan 
From: Daniel J Sebald <daniel.sebald@ie...>  20101229 21:08:55
Attachments:
rand_djs_2010dec29.patch

Getting back to the 31bit vs. 32bit issue. Here is the code of the algorithm: /* Generate pseudo random integers */ k = seed1 / 53668L; seed1 = Xa1 * (seed1  k * 53668L)  k * 12211; if (seed1 < 0) seed1 += Xm1; k = seed2 / 52774L; seed2 = Xa2 * (seed2  k * 52774L)  k * 3791; if (seed2 < 0) seed2 += Xm2; z = seed1  seed2; if (z < 1) z += (Xm1  1); which has signed long values for the seeds. So 017777777777L is correct. However, I think the original comment means that negative values for the seeds should be allowed as well. BUT, gnuplot is using negative values to mean reset the generator. So we can't use negative values. Furthermore, does it make sense that negative seeds should be allowed? Well, let's verify that negative values for the seeds is sane and that the random number generator doesn't do something unexpected for negative values. In the case of seed1, the most negative value results when seed1 is the largest value for which the division has no remainder, i.e., k = 40014 and seed1 = 2147471352. In that case seed1 = Xa1 * (seed1  k * 53668L)  k * 12211; = Xa1 * (2147471352  40014 * 53668)  40014 * 12211 = Xa1 * 0  488610954 = 488610954 then if (seed1 < 0) seed1 += Xm1; or seed1 = 488610954 + 2147483563 = 1658872609 In other words, in normal operation of the congruential random number generator, the seed1 value doesn't ever become negative after a pass of the algorithm. Let's do the same check on seed2. The most negative intermediate value happens for k = 40692 and seed2 = 2147479608, i.e., seed2 = Xa2 * (seed2  k * 52774L)  k * 3791; = Xa2 * (2147479608  40692 * 52774)  40692 * 3791 = 154263372 if (seed2 < 0) seed2 += Xm2; or seed2 = 154263372 + 2147483399 = 1993220027 So, again, if seed2 starts out positive, there is no way it can become negative after any iteration. Not having the original paper in front of me, whether negative input seeds are allowed, I can't be sure. But my feeling is that negative values should not be allowed input values because the normal operation has seed values which only land in the positive integer range. The comment in the code is not a valid one, in my opinion. If that is the case, then there is a bug in the code which hasn't been mentioned yet, e.g., rand({5,20}). That is, setting seed 1 greater than zero and seed 2 less than zero is not currently disallowed by the gnuplot code and could potentially put the generator in a strange state it normally would never see. Ethan, please give the attached patch file a try. I've included more detailed illegal seed values including rejection of any noninteger seeds and any inputs where seed2 is negative. Also, the internal seed is not reset until after a valid reset input sequence is confirmed. Also, I beefed up the documentation including a short sentence about the random number generation algorithm. Here are some example outputs: Terminal type set to 'x11' gnuplot> print rand(2**311) 0.996865893971871 gnuplot> print rand(2**31) Illegal seed value gnuplot> print rand(0.5) Illegal seed value gnuplot> print rand(1.5) Illegal seed value gnuplot> print rand(1) 0.222457440974512 gnuplot> print rand({1,5}) Illegal seed value gnuplot> print rand({5,1}) Illegal seed value gnuplot> print rand({5,0}) 0.999998420858381 gnuplot> print rand({5,2147483648}) Illegal seed value gnuplot> print rand({5,0.5}) Illegal seed value I ran the above on the current CVS version of gnuplot and realized that 2^311 is currently not a valid seed. I'm not sure why that should be disallowed. In the attached patch, 2^311 is a valid seed value. Dan 
From: Daniel J Sebald <daniel.sebald@ie...>  20101228 22:36:45

Thanks Péter. The rand() function should probably be revisited with your comments in mind. More below. Juhász Péter wrote: > Dear gnuplot developers, > > a recent thread[1] in the newsgroup made me play with the rand() > function, and I've found some issues with it: > > 1) rand() silently accepts string arguments: > gnuplot> print rand("foo") > 0.222457440974512 > > While this is not particularly bad in itself, but it's not consistent > with the behavior of other numerical functions (which reject string > arguments), and it's undocumented. Actually, my observation is that functions having real arguments will reject string inputs while functions with optional complex arguments will not reject string inputs. For example gnuplot> print gamma("test") gnuplot> print gamma("test") ^ undefined value gnuplot> print erfc("test") 1.0 Perhaps the gnuplot code needs to be altered to also reject strings if the argument is possibly complex, unless a string is a valid input argument. (But I don't know of any, off hand.) > Its effect is the same as that of > rand(0)  which have misled the user in the thread[1], because he > thought that rand("time") sets the random seed to the current time. Are you proposing there should be an input argument "time"? The command line documentation says nothing about using the current time as a seed. > 2) It is possible to lock the PRNG into a state where the returned > numbers are not random at all: > gnuplot> print rand(0.5) > 0.999999999450207 > gnuplot> print rand(0) > 0.999999999450207 > gnuplot> print rand(0) > 0.999999999450207 > gnuplot> print rand(0) > 0.999999999450207 > gnuplot> print rand(0) > 0.999999999450207 > > I'll refrain from linking to the relevant xkcd or Dilbert comics. Well, here is where gnuplot is deficient in my opinion. The documentation doesn't indicate what the input value should be, a float or an integer. Here's the code to illustrate why this is important: /* Construct new seed values from input parameter */ /* FIXME: Ideally we should allow all 64 bits of seed to be set */ if (real(init) > 0.0) { if (real(init) >= (double)(017777777777UL)) int_error(NO_CARET,"Illegal seed value"); if (imag(init) >= (double)(017777777777UL)) int_error(NO_CARET,"Illegal seed value"); seed1 = (int)real(init); seed2 = (int)imag(init); if (seed2 == 0) seed2 = seed1; } FPRINTF((stderr,"ranf: seed = %lo %lo %ld %ld\n", seed1,seed2)); Comparing floating point with an integer cast to a double is slightly dodgy: real(init) >= (double)(017777777777UL) Hence the uneasy FIXME statement in the comments. More than that, casting floating point quantities to integer quantities for the seed is problematic. This means that any input 0 < x < 1 will put the random number generator in the "zero" state, i.e., stuck. For example, try plot rand(0.5) plot rand(0.34) plot rand(0.789) The thing is, reading the documentation, there is no indication that 0 < x < 1 is invalid. In fact, it's not illogical for the user to assume the value should be in (0,1) since that is what the rand function outputs. There are a couple ways to fix this. (Both methods require giving more detailed description of what numeric type the seed can be.) It seems to be the input of the function can't be either float or integer, it has to be one or the other. If the input is a float, then perhaps 0 < x < 1 and x should be cast to its equivalence in terms of seed. (But it probably isn't a unique equivalence, and therein lies a problem with random number generation with this method when the core algorithm is integer based. Even the existing method of casting loses resolution and has the same problem.) If the input is an integer, then the values should be translated directly to the seeds of the random number generator. (My fear is that the gnuplot code, designed to be generic, has no way of preserving full integer resolution because everything is cast to a float from the command line.) > 3) Consider the following plot: > > set xrange [0:2**22] > set samples 1000 > plot '+' u 1:(rand($1)) w l > > The rand() function, when called with a nonzero argument, sets the seeds > based on the argument and returns the first pseudorandom number from > the sequence associated those seeds. As the plot shows, there is a > rather obvious dependence between rand's argument and the returned > number, in fact the dependence is linear if only the lower 21orso bits > of the argument are considered. This may be true, but I don't know if this is a measure of randomness. > This is a problem if we want to use the standard trick of initializing > the PRNG with the current time (as the user wanted to in [1]): > gnuplot> print rand(real(system("date +%s"))) > 0.596925441003784 > gnuplot> print rand(real(system("date +%s"))) > 0.596924809567054 > gnuplot> print rand(real(system("date +%s"))) > 0.596924178130323 > gnuplot> print rand(real(system("date +%s"))) > 0.596923546693592 > > date +%s returns the time in the Unix time_t format (32 bit integer), > however, the upper bits rarely change in that format. I'm not sure this is that important. This is just initializing the random number generator. So long as the function's output changes for different inputs, it is effectively starting the random number generator in a different state and the very first step should appear to have randomness. If you are doing an experiment in which trials should have "random"ly independent first values, simply drop the first outcome of the random number generator after the seed (or only seed once, not for each new trial). (Probably good advice no matter what method is used to seed the random number generator...The random number generator is designed to look independent across outcomes, but starting each trial with a seed constructed from exterior isn't necessarily going to have good random properties.) > 1) and 2) may be bugs in the implementation that are easy to fix, but 3) > is a deeper problem. Of course, this kind of behavior can be expected > from a linear congruence generator  but then maybe it's time to > consider changing to a different algorithm. To me, what's problematic is the loss of resolution because of the way gnuplot is programmed. In other words, we need to go back and address this comment: /* FIXME: Ideally we should allow all 64 bits of seed to be set */ which likely means allowing integer inputs. Do we need some kind of numeric representation which is either a float or an integer, whichever allows the input value to be stored without rounding of some form? Also, the "rand" documentation should be changed to indicate exactly what numeric type the seed should be. Dan 
From: Juhász Péter <peter.juhasz83@gm...>  20101229 17:14:42

On Tue, 20101228 at 16:36 0600, Daniel J Sebald wrote: > Thanks Péter. The rand() function should probably be revisited with your comments in mind. More below. > > Juhász Péter wrote: > > Dear gnuplot developers, > > > > a recent thread[1] in the newsgroup made me play with the rand() > > function, and I've found some issues with it: > > > > 1) rand() silently accepts string arguments: > > gnuplot> print rand("foo") > > 0.222457440974512 > > > > While this is not particularly bad in itself, but it's not consistent > > with the behavior of other numerical functions (which reject string > > arguments), and it's undocumented. > > Actually, my observation is that functions having real arguments will reject string inputs while functions with optional complex arguments will not reject string inputs. For example > > gnuplot> print gamma("test") > > gnuplot> print gamma("test") > ^ > undefined value > > gnuplot> print erfc("test") > 1.0 > I think this is not true. # optional complex argument gnuplot> pr sin({2,3}) {9.15449914691143, 4.16890695996656} # croaks on meaningless strings gnuplot> pr sin("foo") Nonnumeric string found where a numeric expression was expected # works with meaningful strings gnuplot> pr sin("3") 0.141120008059867 > Perhaps the gnuplot code needs to be altered to also reject strings if the argument is possibly complex, unless a string is a valid input argument. (But I don't know of any, off hand.) > Ethan's suggestion about pop_or_convert_from_string(x) adequately fixes this problem. > > > Its effect is the same as that of > > rand(0)  which have misled the user in the thread[1], because he > > thought that rand("time") sets the random seed to the current time. > > Are you proposing there should be an input argument "time"? No, but now that you mention it, I've found it frustrating that there is no way within gnuplot to get the current time, one has to resort to an external application. There is already an extensive framework in place for formatting date&time strings, there is even a way to put the current time as a time stamp on the plot, but there is no way to get it as a variable (for example). There ought to be a time() function that returns the current gnuplot time (seconds since 2000), the existing formatting functions are adequate to transform that into any other format. Apart from the very narrow use case of initializing rand(), this proposed function would allow for relative time plots, e.g. plot "logfile" u (timecol(1)time()):2 > > Dan Péter Juhász 
From: Daniel J Sebald <daniel.sebald@ie...>  20101229 17:30:29

Juhász Péter wrote: > On Tue, 20101228 at 16:36 0600, Daniel J Sebald wrote: [snip] >>Actually, my observation is that functions having real arguments will reject string inputs while functions with optional complex arguments will not reject string inputs. For example >> >>gnuplot> print gamma("test") >> >>gnuplot> print gamma("test") >> ^ >> undefined value >> >>gnuplot> print erfc("test") >>1.0 >> > > > I think this is not true. > > # optional complex argument > gnuplot> pr sin({2,3}) > {9.15449914691143, 4.16890695996656} > > # croaks on meaningless strings > gnuplot> pr sin("foo") > Nonnumeric string found where a numeric expression was > expected > > # works with meaningful strings > gnuplot> pr sin("3") > 0.141120008059867 Sounds like gnuplot needs some consistency in how this is handled. >>>Its effect is the same as that of >>>rand(0)  which have misled the user in the thread[1], because he >>>thought that rand("time") sets the random seed to the current time. >> >>Are you proposing there should be an input argument "time"? > > > No, but now that you mention it, I've found it frustrating that there is > no way within gnuplot to get the current time, one has to resort to an > external application. There is already an extensive framework in place > for formatting date&time strings, there is even a way to put the current > time as a time stamp on the plot, but there is no way to get it as a > variable (for example). There ought to be a time() function that returns > the current gnuplot time (seconds since 2000), the existing formatting > functions are adequate to transform that into any other format. Good point. Tagging a plot with the current time does seem like something of value. Dan 
From: Juhász Péter <peter.juhasz83@gm...>  20101229 19:05:29
Attachments:
gnuplot_time.patch

On Wed, 20101229 at 10:37 0800, Ethan Merritt wrote: > On Wednesday, December 29, 2010, Juhász Péter wrote: > > I've found it frustrating that there is > > no way within gnuplot to get the current time, one has to resort to an > > external application. There is already an extensive framework in place > > for formatting date&time strings, there is even a way to put the current > > time as a time stamp on the plot, but there is no way to get it as a > > variable (for example). There ought to be a time() function that returns > > the current gnuplot time (seconds since 2000), the existing formatting > > functions are adequate to transform that into any other format. > > Sounds reasonable to me. The only concern I see is the question of > whether it can be made totally portable, or whether it will need > documentation as being systemdependent. > > As you noted before in a sample script, on unixish systems you can do > time = system("date +%s") > to get the number of seconds since the unix epoch. Maybe it would be > sufficient to export a user variable containing the offset between > system time and internal time? > internal_time = system("date +%s")  GPVAL_EPOCH > Nah. That would just confuse everyone. Please see the attached patch. I think it's as portable as we can make it even in this rough draft state. From the user's standpoint, we define gnuplot time as the number of seconds elapsed since 2000.0 and the new function returns this number, and that's that. Internally this depends on the C library's time() function, which returns the number of seconds since 1970, from which number we get the result with a simple subtraction. I think it's reasonable to expect that C's time() function will be available on every platform where gnuplot can be compiled. Two observations: The time stamp generated by "set timestamp" uses localtime() internally (buried deep in graphics.c), so it prints the local time. Other timemanagement functions generally work with gmtime(). Is that true that functions in gnuplot need to have at least one argument? And one more: We keep saying "the number of seconds elapsed since 2000.0". Unfortunately, that's not as unambiguous as one would think. There is that nasty business of leap seconds, that is, the adjustments made to UTC to keep it synchronized to mean solar time. This is needed because Earth's rotation is not constant, it's slowly decelerating. This means that UTC is not in sync with a pure atomic time (such as used by GPS), the discrepancy is 34 seconds now. If we were to be really anal about timekeeping in gnuplot, we should tell users exactly which time standard do we follow. Péter Juhász 
From: <plotter@pi...>  20101229 19:47:18

On 12/29/10 20:05, Juhász Péter wrote: > This means that UTC is not in sync with a pure atomic time (such as used > by GPS), the discrepancy is 34 seconds now. > If we were to be really anal about timekeeping in gnuplot, we should > tell users exactly which time standard do we follow. Anal sounds good to me. Science is all about precision and standards. UTC would be the most logical standard time to use rather than GPS time but it should be clearly documented along with this feature. If the point it to be able to identify when a graph was plotted , syncing with system clock (UTC based) is obviously the way to go. The result will never be more reliable than the synchronisity of the system clock. If someone is releasing plots where 34s is important to when they were plotted, they will need to be aware of these issues and the doc will tell what time reference was used. /Peter. 
From: Juhász Péter <peter.juhasz83@gm...>  20101229 16:50:07

On Tue, 20101228 at 14:25 0800, Ethan Merritt wrote: > On Tuesday, December 28, 2010, Juhász Péter wrote: > > Dear gnuplot developers, > > > > a recent thread[1] in the newsgroup made me play with the rand() > > function, and I've found some issues with it: > > > > 1) rand() silently accepts string arguments: > > gnuplot> print rand("foo") > > 0.222457440974512 > > > > While this is not particularly bad in itself, but it's not consistent > > with the behavior of other numerical functions (which reject string > > arguments), and it's undocumented. Its effect is the same as that of > > rand(0)  which have misled the user in the thread[1], because he > > thought that rand("time") sets the random seed to the current time. > > I am not terribly concerned that passing a string to a numeric function > produces a garbage result, but it would be easy enough to do the same > thing in specfun.c that is already done in internal.c and standard.c > internal.c:#define pop(x) pop_or_convert_from_string(x) > standard.c:#define pop(x) pop_or_convert_from_string(x) I tried this and it works nicely. If we were to put this definition at the beginning of the file, then it may affect other functions, a fact that requires further testing. I'd say we just replace the function call in f_rand itself:  (void) real(pop(&a)); + (void) real(pop_or_convert_from_string(&a)); As it is now, rand("foo") does the same as rand(0), but rand("3") fails with an error message. This change fixes both issues. > > > 2) It is possible to lock the PRNG into a state where the returned > > numbers are not random at all: > > gnuplot> print rand(0.5) > > 0.999999999450207 > > gnuplot> print rand(0) > > 0.999999999450207 > > gnuplot> print rand(0) > > 0.999999999450207 > > gnuplot> print rand(0) > > 0.999999999450207 > > gnuplot> print rand(0) > > 0.999999999450207 > > Here I agree that the documentation should state that seeds are > required to be nonzero integers, although it is then tricky to > explain that the way to set both seed values is > rand( i + j*{0,1} ) > > It's fair to consider acceptance of a seed value (0 < seed < 1) as a > bug, since it will immediately be converted to an integer value 0 > which is not a valid seed. This might be an adequate fix: > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% >  gnuplot/src/specfun.c 20101021 22:28:24.000000000 0700 > +++ gnuplotcvs/src/specfun.c 20101228 13:23:20.000000000 0800 > @@ 1098,7 +1098,7 @@ ranf(struct value *init) > > /* Construct new seed values from input parameter */ > /* FIXME: Ideally we should allow all 64 bits of seed to be set */ >  if (real(init) > 0.0) { > + if (real(init) > 1.0) { > if (real(init) >= (double)(017777777777UL)) > int_error(NO_CARET,"Illegal seed value"); > if (imag(init) >= (double)(017777777777UL)) > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > Note that the FIXME comment is out of date. You can indeed set 64 bits > of seed by using rand(i + j*{0,1}) as noted above. > Seems OK. However, a sentence or two should go into the documentation mentioning that only the integer part(s) of the seed are meaningful. >> A couple odd things however. Why is the limiting value 017777777777UL? >> Is that some bad way of obtaining the maximum 32 value of 2^321, i.e., 4294967295? >> I think the intent is to make sure that the integer >> value didn't overflow before being converted to (double). > Off the top of my head, I can't explain why the test isn't > against 037777777777 instead. Simply because the highest bit stores the sign in a signed integer, and the integer part of struct value is a signed int? > > 3) Consider the following plot: > > > > set xrange [0:2**22] > > set samples 1000 > > plot '+' u 1:(rand($1)) w l > > > > The rand() function, when called with a nonzero argument, sets the seeds > > based on the argument and returns the first pseudorandom number from > > the sequence associated those seeds. As the plot shows, there is a > > rather obvious dependence between rand's argument and the returned > > number, in fact the dependence is linear if only the lower 21orso bits > > of the argument are considered. > > Now I'm out of my depth. > Pseudorandom number generation is fraught with pitfalls. > > Is it necessarily a bad thing that the first number is related to the seed? > The primary requirement is that successive numbers within a generated > sequence be sufficiently independent, or so I understand it. > > A requirement that related seeds produce unrelated starting values > for the sequence seems like a quite different thing. I can imagine that > such a property may be desirable for cryptography, but it seems > unrelated to uses in gnuplot. > [snip] > > I don't see why this is a problem. Explain, please? > If the idea is to generate different streams of random numbers, > I think that it succeeds at that. > True, the initial values of sequences generated in close succession > are similar, but again I have to ask why this is a problem? > If it bothers you, can't you just start with the second value of each > sequence? > OK, I retract my complaint. > > date +%s returns the time in the Unix time_t format (32 bit integer), > > however, the upper bits rarely change in that format. > > So why not use system("date +%N")? > I didn't know about this. Thanks. > > > > 1) and 2) may be bugs in the implementation that are easy to fix, but 3) > > is a deeper problem. Of course, this kind of behavior can be expected > > from a linear congruence generator  but then maybe it's time to > > consider changing to a different algorithm. > > > > Péter Juhász > > ps. Merry Christmas and a Happy New Year to you all! 
From: Ethan Merritt <merritt@u.washington.edu>  20101229 18:23:04

On Wednesday, December 29, 2010, Juhász Péter wrote: > > it would be easy enough to do the same > > thing in specfun.c that is already done in internal.c and standard.c > > internal.c:#define pop(x) pop_or_convert_from_string(x) > > standard.c:#define pop(x) pop_or_convert_from_string(x) > > I tried this and it works nicely. If we were to put this definition at > the beginning of the file, then it may affect other functions, a fact > that requires further testing. I think it's safe enough. All of the calls to pop() in that file are expecting a numerical value, so they should all benefit equally from better handling of a string value passed by mistake or in the expectation that it will be converted. > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > >  gnuplot/src/specfun.c 20101021 22:28:24.000000000 0700 > > +++ gnuplotcvs/src/specfun.c 20101228 13:23:20.000000000 0800 > > @@ 1098,7 +1098,7 @@ ranf(struct value *init) > > > > /* Construct new seed values from input parameter */ > > /* FIXME: Ideally we should allow all 64 bits of seed to be set */ > >  if (real(init) > 0.0) { > > + if (real(init) > 1.0) { > > if (real(init) >= (double)(017777777777UL)) > > int_error(NO_CARET,"Illegal seed value"); > > if (imag(init) >= (double)(017777777777UL)) > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > > Note that the FIXME comment is out of date. You can indeed set 64 bits > > of seed by using rand(i + j*{0,1}) as noted above. > > > > Seems OK. I've made this change, since it fixes an error (rand(0) stalling out on a single value). But the points below still need some thought. It may well be OK the way it is, but perhaps there's a subtle error. 2^31 possible seeds is probably enough :) > However, a sentence or two should go into the documentation > mentioning that only the integer part(s) of the seed are meaningful. > > >> A couple odd things however. Why is the limiting value > 017777777777UL? > >> Is that some bad way of obtaining the maximum 32 value of 2^321, > i.e., 4294967295? > >> I think the intent is to make sure that the integer > >> value didn't overflow before being converted to (double). > > Off the top of my head, I can't explain why the test isn't > > against 037777777777 instead. > > Simply because the highest bit stores the sign in a signed integer, and > the integer part of struct value is a signed int? > 
From: Ethan Merritt <merritt@u.washington.edu>  20101229 18:37:39

On Wednesday, December 29, 2010, Juhász Péter wrote: > I've found it frustrating that there is > no way within gnuplot to get the current time, one has to resort to an > external application. There is already an extensive framework in place > for formatting date&time strings, there is even a way to put the current > time as a time stamp on the plot, but there is no way to get it as a > variable (for example). There ought to be a time() function that returns > the current gnuplot time (seconds since 2000), the existing formatting > functions are adequate to transform that into any other format. Sounds reasonable to me. The only concern I see is the question of whether it can be made totally portable, or whether it will need documentation as being systemdependent. As you noted before in a sample script, on unixish systems you can do time = system("date +%s") to get the number of seconds since the unix epoch. Maybe it would be sufficient to export a user variable containing the offset between system time and internal time? internal_time = system("date +%s")  GPVAL_EPOCH Nah. That would just confuse everyone. 
From: Daniel J Sebald <daniel.sebald@ie...>  20101229 18:45:15

Ethan Merritt wrote: > On Wednesday, December 29, 2010, Juhász Péter wrote: > >>I've found it frustrating that there is >>no way within gnuplot to get the current time, one has to resort to an >>external application. There is already an extensive framework in place >>for formatting date&time strings, there is even a way to put the current >>time as a time stamp on the plot, but there is no way to get it as a >>variable (for example). There ought to be a time() function that returns >>the current gnuplot time (seconds since 2000), the existing formatting >>functions are adequate to transform that into any other format. > > > Sounds reasonable to me. The only concern I see is the question of > whether it can be made totally portable, or whether it will need > documentation as being systemdependent. > > As you noted before in a sample script, on unixish systems you can do > time = system("date +%s") > to get the number of seconds since the unix epoch. Maybe it would be > sufficient to export a user variable containing the offset between > system time and internal time? > internal_time = system("date +%s")  GPVAL_EPOCH > Nah. That would just confuse everyone. Can't the C time routines in time.h be used somehow? ... I'll have a follow up to the rand() seed discussion in a little while. Dan 