From: Juhász P. <pet...@gm...> - 2010-12-29 16:50:07
|
On Tue, 2010-12-28 at 14:25 -0800, Ethan Merritt wrote: > On Tuesday, December 28, 2010, Juhász Péter wrote: > > Dear gnuplot developers, > > > > a recent thread[1] in the newsgroup made me play with the rand() > > function, and I've found some issues with it: > > > > 1) rand() silently accepts string arguments: > > gnuplot> print rand("foo") > > 0.222457440974512 > > > > While this is not particularly bad in itself, but it's not consistent > > with the behavior of other numerical functions (which reject string > > arguments), and it's undocumented. Its effect is the same as that of > > rand(0) - which have misled the user in the thread[1], because he > > thought that rand("time") sets the random seed to the current time. > > I am not terribly concerned that passing a string to a numeric function > produces a garbage result, but it would be easy enough to do the same > thing in specfun.c that is already done in internal.c and standard.c > internal.c:#define pop(x) pop_or_convert_from_string(x) > standard.c:#define pop(x) pop_or_convert_from_string(x) I tried this and it works nicely. If we were to put this definition at the beginning of the file, then it may affect other functions, a fact that requires further testing. I'd say we just replace the function call in f_rand itself: - (void) real(pop(&a)); + (void) real(pop_or_convert_from_string(&a)); As it is now, rand("foo") does the same as rand(0), but rand("3") fails with an error message. This change fixes both issues. > > > 2) It is possible to lock the PRNG into a state where the returned > > numbers are not random at all: > > gnuplot> print rand(0.5) > > 0.999999999450207 > > gnuplot> print rand(0) > > 0.999999999450207 > > gnuplot> print rand(0) > > 0.999999999450207 > > gnuplot> print rand(0) > > 0.999999999450207 > > gnuplot> print rand(0) > > 0.999999999450207 > > Here I agree that the documentation should state that seeds are > required to be non-zero integers, although it is then tricky to > explain that the way to set both seed values is > rand( i + j*{0,1} ) > > It's fair to consider acceptance of a seed value (0 < seed < 1) as a > bug, since it will immediately be converted to an integer value 0 > which is not a valid seed. This might be an adequate fix: > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > --- gnuplot/src/specfun.c 2010-10-21 22:28:24.000000000 -0700 > +++ gnuplot-cvs/src/specfun.c 2010-12-28 13:23:20.000000000 -0800 > @@ -1098,7 +1098,7 @@ ranf(struct value *init) > > /* Construct new seed values from input parameter */ > /* FIXME: Ideally we should allow all 64 bits of seed to be set */ > - if (real(init) > 0.0) { > + if (real(init) > 1.0) { > if (real(init) >= (double)(017777777777UL)) > int_error(NO_CARET,"Illegal seed value"); > if (imag(init) >= (double)(017777777777UL)) > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > Note that the FIXME comment is out of date. You can indeed set 64 bits > of seed by using rand(i + j*{0,1}) as noted above. > Seems OK. However, a sentence or two should go into the documentation mentioning that only the integer part(s) of the seed are meaningful. >> A couple odd things however. Why is the limiting value 017777777777UL? >> Is that some bad way of obtaining the maximum 32 value of 2^32-1, i.e., 4294967295? >> I think the intent is to make sure that the integer >> value didn't overflow before being converted to (double). > Off the top of my head, I can't explain why the test isn't > against 037777777777 instead. Simply because the highest bit stores the sign in a signed integer, and the integer part of struct value is a signed int? > > 3) Consider the following plot: > > > > set xrange [0:2**22] > > set samples 1000 > > plot '+' u 1:(rand($1)) w l > > > > The rand() function, when called with a nonzero argument, sets the seeds > > based on the argument and returns the first pseudo-random number from > > the sequence associated those seeds. As the plot shows, there is a > > rather obvious dependence between rand's argument and the returned > > number, in fact the dependence is linear if only the lower 21-or-so bits > > of the argument are considered. > > Now I'm out of my depth. > Pseudo-random number generation is fraught with pitfalls. > > Is it necessarily a bad thing that the first number is related to the seed? > The primary requirement is that successive numbers within a generated > sequence be sufficiently independent, or so I understand it. > > A requirement that related seeds produce unrelated starting values > for the sequence seems like a quite different thing. I can imagine that > such a property may be desirable for cryptography, but it seems > unrelated to uses in gnuplot. > [snip] > > I don't see why this is a problem. Explain, please? > If the idea is to generate different streams of random numbers, > I think that it succeeds at that. > True, the initial values of sequences generated in close succession > are similar, but again I have to ask why this is a problem? > If it bothers you, can't you just start with the second value of each > sequence? > OK, I retract my complaint. > > date +%s returns the time in the Unix time_t format (32 bit integer), > > however, the upper bits rarely change in that format. > > So why not use system("date +%N")? > I didn't know about this. Thanks. > > > > 1) and 2) may be bugs in the implementation that are easy to fix, but 3) > > is a deeper problem. Of course, this kind of behavior can be expected > > from a linear congruence generator - but then maybe it's time to > > consider changing to a different algorithm. > > > > Péter Juhász > > ps. Merry Christmas and a Happy New Year to you all! |