Maxima -- GPL CAS based on DOE-MACSYMA / Bugs / #4078 doc of random unclear about max value for denormalized args

#4078 doc of random unclear about max value for denormalized args

Milestone: None

Status: open

Owner: nobody

Labels: random (1) sbcl (8) documentation (46)

Priority: 5

Updated: 2023-01-27

Created: 2023-01-16

Creator: Stavros Macrakis

Private: No

UPDATE: This is a documentation bug. See the last comment.

random(f), for f a denormalized number, underrepresents 0.0 and can return f, even though the spec says that the result is strictly less than the argument.

Simple demo:

multiset(list) :=
  block([ar,idxlist,vallist,retval:[]],local(ar),
     list: map('ratdisrep,list),
     ar[x]:=0,
     for i in list do ar[i]:ar[i]+1,
     idxlist: rest(arrayinfo(ar),2),
     vallist: listarray(ar),
     kill(ar),
     while idxlist # [] do
       ( push([first(first(idxlist)),first(vallist)],retval),
         idxlist: rest(idxlist),
     vallist: rest(vallist) ),
     sort(retval))$

fpprintprec:6$

multiset(makelist(random(2e-323),i,1,1000)) =>
    [[0.0, 113], [4.94066e-324, 262], [9.88131e-324, 247], [1.4822e-323, 259], [1.97626e-323, 119]]
2e-323 => 1.97626e-323

0.0 is about half as common as it should be, and 1.97626e-323 shouldn't be returned at all.
I have tested this with larger denormalized floats as well, e.g., random(4.94e-320).

This behavior is inherited from SBCL's random function -- I will file a bug report with them.

Maxima 5.45.1 SBCL 2.0.0 Windows

Discussion

Raymond Toy - 2023-01-18

Maxima's rand-mt19937.lisp says it's from cmucl (not that that matters much; cmucl doesn't use mt19937 anymore.)

Perhaps these are relevant: https://vigna.di.unimi.it/xorshift/, Look for the section titled "Generating uniform doubles in the unit interval". There's a link there to https://vigna.di.unimi.it/xorshift/random_real.c for generating these.

This seems like such a niche area.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stavros Macrakis - 2023-01-18

Yes, it is a niche area, but I came across it honestly -- I wasn't looking for problems with random, but rather trying to use random for some testing. The current code for random(float) must be something like random(2^53)/2.0^53*float, which isn't quite right for denormalized numbers.

I believe this is a correct fix (even if not elegant): if random(a)=a, return 0.0.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stavros Macrakis - 2023-01-19

i think I'm at least partially wrong. More tomorrow.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Raymond Toy - 2023-01-19

The comment in the code for %random-double-float says

Handle the single or double float case of RANDOM. We generate a float in [0d0, 1d0) by clobbering the mantissa of 1d0 with random bits (52 bits); this yields a number in [1d0, 2d0). Then 1d0 is subtracted.

I believe this means the lowest bit of the fraction is always 0, so we're missing one bit.

This is then multiplied by the arg of random.

The xoroshiro link says that to get a random float in the interval the same interval you should probably be (x >> 1) * 2^-53, were x is a 64-bit unsigned integer.

I'm guessing, but if this matters, we should use the algorithm given in random_real.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stavros Macrakis - 2023-01-19

I was wrong. The current distribution of values of random(<denorm>) is in fact correct. It is the documentation that is subtly wrong. random(x) for float x in effect generates a random real (not float) number in the interval [0,x). It then rounds it to the nearest floating-point number.

The interval [0,x) can be broken up into equivalence classes defined by "nearest floating point number". Calling the smallest float ε, those intervals are (0,ε/2)(ε/2,3*ε/2)(3*ε/2,5ε/2) ... (x-ε/2,x). That is, the first and last intervals are half the size of the other intervals, so should appear half as often as the others.

The only error is that random is defined to return a number "less than <x>". In fact, it should be "less than or equal to <x>". For <x> in a normal range, this hardly matters, since the probability of returning <x> is tiny.</x></x></x></x>

So I will change this to a documentation bug.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Raymond Toy - 2023-01-19

But based on the code, random(x) should return a value less than x. Unless I'm missing something. And we should be consistent. If random(x) can return x for floating-point values of x, then it should also return x when x is an integer. For integers, I don't think that's possible because it does (rem bits x).

Note that this method has a bias. Perhaps it should be replaced by a method that doesn't have the bias. But that's a different bug. Perhaps people aren't generating enough values to notice the bias.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Stavros Macrakis - 2023-01-19
  
  The only time anyone will notice that x is a possible value for random(x) (with float x) is for small denormalized numbers. For normalized numbers, the probability of generating exactlyx is something like 2^-52 = 10^-16.
  
  This is very different from the integer case.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Raymond Toy - 2023-01-19

Ok, I think random(x) producing x only happens when x is a denormal because there aren't enough bits left to represent the product of 0.9999999999999998*x. The number 0.999...8 comes from how %random-double-float generates a large integer and smashes that into a double. When x is not a denormal, there are enough bits so that 0.999...8*x has enough bits so that the product is not x. I think.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stavros Macrakis - 2023-01-27

summary: random not uniform for denormalized floats --> doc of random unclear about max value for denormalized args
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.