From: Paul Khuong <pvk@pv...>  20120710 16:34:21

In article <E1SoKEb000696PN@...>, Waldek Hebisch <hebisch@...> wrote: > Christophe Rhodes wrote: > > The issue at hand is whether the "extra" density of floats around 0 > > should be used by the RNG. At first, it seems obvious that it should, > > because well, why not? > > One paradigm for floating point operations is that operation is > first performed exactly and after that the result is rounded > to a representable float. According to this view, the probability of (random 1.0) returning 1.0 should be positive. In fact, it should be equal to the probability of the random float being < 2^{25}. I detect a certain tension with the spec here. > > On the other hand, imagine a simple use of a RNG > > to generate samples from a distribution using the CDF and a lookup > > table: generate a float between 0 and 1 and transform according to the > > inverse of the CDF. Ignoring for the moment the actual generation of > > zeros, if the RNG exploits the wide range of floats around 0, the lower > > tail of the distribution will be much, much more explored than the upper > > tail, because the floating point resolution around 0 is far greater than > > it is around 1. > > I am not sure what you mean by "more explored". Many more distinct values in the left tail than in the right one. > When RNG makes good use of extra > absolute precision available close to 0 then tail of transformed > distribution is much closer to the true exponential distibution. Ah, but what happens if I need my extra precision at the other end? Or what if I'm working with a symmetric PDF? Or, what if my uniform's lower bound isn't 0? > Of course, the user may do something stupid, like using log(1  x) > with x unifor in (0,1) to generate exponential distibution. Then > extra effort spent close to 0 is wasted. http://en.wikipedia.org/wiki/Antithetic_variates. It's not stupid, but *useful*; sophisticated, one might even say. > Given the above I think that RNG which makes use of extra precision > around 0 is better than one which does not. I like the current behaviour for two reasons: it's simple to explain and to reason about (we generate random fractions with a fixed denominator, and express them as floats), and it's the most common way to do it. Simplicity is important to me because, as I noted above, getting small values around 0 right isn't enough: the exact same problems, modulo trivial differences, still crop up. The gain in correctness of intuitive code are extremely partial, and contingent on avoiding intuitively equivalent reformulations. This plays in the second reason: AFAICT, it's by far the most common way to generate uniforms (either the U[1, 2)1 trick, or by taking exactlyrepresented integers *and scaling them by a reciprocal*). Any issue with this way of doing things is common and language independent, and workarounds are likely to be known. Intuitively, any solution would still be applicable when generating tiny values; realistically, I know better than to trust my intuition here. I could certainly believe that there are strange sideeffects on statistics when using these variates in stochastic simulations. Either way, someone who cares about that ought to be basing their experiments on welltested methods, and these methods will most likely have been tested with a fixedprecision uniform generator. Oh, extra data point: none of the test suite that I know of (diehard, dieharder, TestU01) attempts to detect that behaviour. Still, I should be back in Montreal very soon, and I'll try to ask some stochastic simulation people. Paul Khuong 