From: Burton Samograd <burton.samograd@gm...>  20120708 00:45:00

After running my simulation code for a few minutes I get the following DIVISIONBYZERO error: debugger invoked on a DIVISIONBYZERO in thread #<THREAD "main thread" RUNNING {1002978F23}>: arithmetic error DIVISIONBYZERO signalled Type HELP for debugger help, or (SBEXT:QUIT) to exit from SBCL. restarts (invokable by number or by possiblyabbreviated name): 0: [RETRY ] Retry EVAL of current toplevel form. 1: [CONTINUE] Ignore error and continue loading file "/home/burton/petridish/pdsmoother.lisp". 2: [ABORT ] Abort loading file "/home/burton/petridish/pdsmoother.lisp". 3: Exit debugger, returning to top level. ("bogus stack frame") 0] down (TURN #S(ANIMAL :NAME 255864 :X 917 :Y 369 :DX 0 :DY 0.5338847560008251d0 :ENERGY 12.935097 :DIR 2.5060025128981067d0 :GENES (16.031828 18.04544 4.023058 6.024463 5.013414 47.029236 159.04361 101.0075 9.032942 20.054955 13.022877 15.029232 ...) :GENECHOICESUM 83.23452 :AGE 159.04361 :BIRTHDAY 1776 :BIRTHPLACE NIL ...)) 1] source ; file: /home/burton/petridish/pdsmoother.lisp (NORMAL TURNAVG TURNDEV) 1] ; this is only in one spot: (mod (+ (animaldir animal) (* dir (normal turnavg turndev))) 2pi) 4.0457375693298125d0 1] ; works fine manually Here is the definition of normal, which has no divisions: (defun normal (&optional (a 1.0) (s 0.5)) (+ a (* (sqrt (* 2 (log (random 1.0)))) (cos (* 2 pi (random 1.0))) s))) Any idea where this division by zero error might be coming from?  Burton Samograd 
From: Kevin Reid <kpreid@sw...>  20120708 01:20:20

On Jul 7, 2012, at 17:44, Burton Samograd wrote: > Here is the definition of normal, which has no divisions: > > (defun normal (&optional (a 1.0) (s 0.5)) > (+ a (* (sqrt (* 2 (log (random 1.0)))) (cos (* 2 pi (random 1.0))) s))) > > Any idea where this division by zero error might be coming from? RANDOM may return 0.0 in which case LOG will (on SBCL) signal DIVISIONBYZERO.  Kevin Reid <http://switchb.org/kpreid/>; 
From: Burton Samograd <burton.samograd@gm...>  20120708 02:23:35

On 120707 07:03 PM, Kevin Reid wrote: > On Jul 7, 2012, at 17:44, Burton Samograd wrote: > >> Here is the definition of normal, which has no divisions: >> >> (defun normal (&optional (a 1.0) (s 0.5)) >> (+ a (* (sqrt (* 2 (log (random 1.0)))) (cos (* 2 pi (random 1.0))) s))) >> >> Any idea where this division by zero error might be coming from? > RANDOM may return 0.0 in which case LOG will (on SBCL) signal DIVISIONBYZERO. > Ah, I had a feeling it was something like that but the division by zero was throwing me off. Thanks. I got that line from Rosetta Code; maybe it should be fixed. What is a good fix for this? I'm thinking that adding a very small value to the result of random might suffice but I don't like it.  Burton Samograd 
From: Jeffrey Cunningham <jeffrey@jk...>  20120708 03:29:19
Attachments:
Message as HTML

On 07/07/2012 07:23 PM, Burton Samograd wrote: > On 120707 07:03 PM, Kevin Reid wrote: >> On Jul 7, 2012, at 17:44, Burton Samograd wrote: >> >>> Here is the definition of normal, which has no divisions: >>> >>> (defun normal (&optional (a 1.0) (s 0.5)) >>> (+ a (* (sqrt (* 2 (log (random 1.0)))) (cos (* 2 pi (random 1.0))) s))) >>> >>> Any idea where this division by zero error might be coming from? >> RANDOM may return 0.0 in which case LOG will (on SBCL) signal DIVISIONBYZERO. >> > Ah, I had a feeling it was something like that but the division by zero > was throwing me off. > > Thanks. I got that line from Rosetta Code; maybe it should be fixed. > What is a good fix for this? I'm thinking that adding a very small value > to the result of random might suffice but I don't like it. > >  > Burton Samograd Adding a small amount would warp the distribution (a tiny amount). But this will work and won't: (defun normal (&optional (a 1.0) (s 0.5)) (+ a (* (sqrt (* 2 (log ( 1.0 (random 1.0))))) (cos (* 2 pi (random 1.0))) s))) According to the spec, random produces numbers always less than the limit so it will never hit zero. And its a uniform distribution so you haven't changed its statistical properties. Jeff Cunningham 
From: Jeffrey Cunningham <jeffrey@jk...>  20120709 00:01:27
Attachments:
Message as HTML

On 07/07/2012 08:02 PM, Jeffrey Cunningham wrote: > > On 07/07/2012 07:23 PM, Burton Samograd wrote: >> On 120707 07:03 PM, Kevin Reid wrote: >>> On Jul 7, 2012, at 17:44, Burton Samograd wrote: >>> >>>> Here is the definition of normal, which has no divisions: >>>> >>>> (defun normal (&optional (a 1.0) (s 0.5)) >>>> (+ a (* (sqrt (* 2 (log (random 1.0)))) (cos (* 2 pi (random 1.0))) s))) >>>> >>>> Any idea where this division by zero error might be coming from? >>> RANDOM may return 0.0 in which case LOG will (on SBCL) signal DIVISIONBYZERO. >>> >> Ah, I had a feeling it was something like that but the division by zero >> was throwing me off. >> >> Thanks. I got that line from Rosetta Code; maybe it should be fixed. >> What is a good fix for this? I'm thinking that adding a very small value >> to the result of random might suffice but I don't like it. >> >>  >> Burton Samograd > > Adding a small amount would warp the distribution (a tiny amount). But > this will work and won't: > > (defun normal (&optional (a 1.0) (s 0.5)) > (+ a (* (sqrt (* 2 (log ( 1.0 (random 1.0))))) (cos (* 2 pi > (random 1.0))) > s))) > > According to the spec, random produces numbers always less than the > limit so it will never hit zero. And its a uniform distribution so you > haven't changed its statistical properties. > > Jeff Cunningham > Have you looked at a distribution of the outputs? I'm wondering if there's something else wrong. You really shouldn't see zero show up as an argument to log except once in a blue moon. It might be very small numbers will kill log also. If that's the case, then you could figure out the threshold for smallest argument to log, and put some logic in there to exclude those cases (regen another uniform variate). Jeff Cunningham 
From: Burton Samograd <burton.samograd@gm...>  20120709 03:39:34

On 120708 06:01 PM, Jeffrey Cunningham wrote: > > On 07/07/2012 08:02 PM, Jeffrey Cunningham wrote: >> >> On 07/07/2012 07:23 PM, Burton Samograd wrote: >>> On 120707 07:03 PM, Kevin Reid wrote: >>>> On Jul 7, 2012, at 17:44, Burton Samograd wrote: >>>> >>>>> Here is the definition of normal, which has no divisions: >>>>> >>>>> (defun normal (&optional (a 1.0) (s 0.5)) >>>>> (+ a (* (sqrt (* 2 (log (random 1.0)))) (cos (* 2 pi (random 1.0))) s))) >>>>> >>>>> Any idea where this division by zero error might be coming from? >>>> RANDOM may return 0.0 in which case LOG will (on SBCL) signal DIVISIONBYZERO. >>>> >>> Ah, I had a feeling it was something like that but the division by zero >>> was throwing me off. >>> >>> Thanks. I got that line from Rosetta Code; maybe it should be fixed. >>> What is a good fix for this? I'm thinking that adding a very small value >>> to the result of random might suffice but I don't like it. >>> >>>  >>> Burton Samograd >> >> Adding a small amount would warp the distribution (a tiny amount). >> But this will work and won't: >> >> (defun normal (&optional (a 1.0) (s 0.5)) >> (+ a (* (sqrt (* 2 (log ( 1.0 (random 1.0))))) (cos (* 2 pi >> (random 1.0))) >> s))) >> >> According to the spec, random produces numbers always less than the >> limit so it will never hit zero. And its a uniform distribution so >> you haven't changed its statistical properties. >> >> Jeff Cunningham >> > Have you looked at a distribution of the outputs? I'm wondering if > there's something else wrong. You really shouldn't see zero show up as > an argument to log except once in a blue moon. That was the problem. I'm running a simulation with hundreds of thousands of agents and that blue moon shows up pretty frequently. I tried your idea of subtracting from 1 on CCL (SBCL is being too finicky right now) and it doesn't seem to follow the standard of [0.0...1.0). I ended up just adding 1E32 to (random 1.0); I really don't think it will make a difference (famous last words). > It might be very small numbers will kill log also. If that's the case, > then you could figure out the threshold for smallest argument to log, > and put some logic in there to exclude those cases (regen another > uniform variate). Maybe but since my tiny addition I haven't had the problem so I'm satisfied with it as a solution for now. > > Jeff Cunningham >  Burton Samograd 
From: Jeffrey Cunningham <jeffrey@jk...>  20120709 04:11:43
Attachments:
Message as HTML

On 07/08/2012 08:39 PM, Burton Samograd wrote: > That was the problem. I'm running a simulation with hundreds of > thousands of agents and that blue moon shows up pretty frequently. I > tried your idea of subtracting from 1 on CCL (SBCL is being too > finicky right now) and it doesn't seem to follow the standard of > [0.0...1.0). I ended up just adding 1E32 to (random 1.0); I really > don't think it will make a difference (famous last words). >> It might be very small numbers will kill log also. If that's the case, >> then you could figure out the threshold for smallest argument to log, >> and put some logic in there to exclude those cases (regen another >> uniform variate). > Maybe but since my tiny addition I haven't had the problem so I'm > satisfied with it as a solution for now. >> Jeff Cunningham >> >  > Burton Samograd > > Careful that U+epsilon doesn't exceed 1.0 or you will mess up the distribution where it counts. It probably would be better to throw away U's that are less than epsilon. All your doing then is cutting off the tails way out there somewhere. Jeff Cunningham 
From: <m_mommer@ya...>  20120709 12:25:12

Jeffrey Cunningham <jeffrey@...> writes: > Have you looked at a distribution of the outputs? I'm wondering if there's > something else wrong. You really shouldn't see zero show up as an argument to > log except once in a blue moon. When using single floats, the probability of obtaining a zero is not so low, at least with the usual way of generating random floats. The following code usually returns in less than a second on my home desktop. (let ((i 0)) (loop while (> (random 1.0s0) 0) do (incf i)) i) This one takes a lot longer: (let ((i 0)) (loop while (> (random 1.0d0) 0) do (incf i)) i) There are other, arguably better, ways to generate random floats, and then zeros are really unlikely to happen. With this patch, http://comments.gmane.org/gmane.lisp.steelbank.devel/16674 a singlefloat zero appears on average every (expt 2 150) calls to (random 1.0s0), which for practical purposes means never. Regards, Mario 
From: Jeffrey Cunningham <jeffrey@jk...>  20120709 14:33:47
Attachments:
Message as HTML

On 07/09/2012 05:24 AM, Mario S. Mommer wrote: > Jeffrey Cunningham <jeffrey@...> writes: >> Have you looked at a distribution of the outputs? I'm wondering if there's >> something else wrong. You really shouldn't see zero show up as an argument to >> log except once in a blue moon. > When using single floats, the probability of obtaining a zero is not so > low, at least with the usual way of generating random floats. > > The following code usually returns in less than a second on my home > desktop. > > (let ((i 0)) (loop while (> (random 1.0s0) 0) do (incf i)) i) > > This one takes a lot longer: > > (let ((i 0)) (loop while (> (random 1.0d0) 0) do (incf i)) i) > > There are other, arguably better, ways to generate random floats, and > then zeros are really unlikely to happen. With this patch, > > http://comments.gmane.org/gmane.lisp.steelbank.devel/16674 > > a singlefloat zero appears on average every (expt 2 150) calls to > (random 1.0s0), which for practical purposes means never. > > Regards, > Mario > That is very flawed behavior in a random number generator you are describing. What is the status of your patch? Regards, Jeff Cunningham 
From: Christophe Rhodes <csr21@ca...>  20120709 14:56:28

Jeffrey Cunningham <jeffrey@...> writes: > That is very flawed behavior in a random number generator you are > describing. What is the status of your patch? That (that the RNG was flawed) was my instinct too, initially, but I think I've been convinced out of it: that instead, lots that relates to floating point is not as obvious as it first seems. The issue at hand is whether the "extra" density of floats around 0 should be used by the RNG. At first, it seems obvious that it should, because well, why not? On the other hand, imagine a simple use of a RNG to generate samples from a distribution using the CDF and a lookup table: generate a float between 0 and 1 and transform according to the inverse of the CDF. Ignoring for the moment the actual generation of zeros, if the RNG exploits the wide range of floats around 0, the lower tail of the distribution will be much, much more explored than the upper tail, because the floating point resolution around 0 is far greater than it is around 1. There are other issues, but this one was enough to convince me that I didn't want to place too much emphasis on my own instinct, and that I'd like to read "what every RNG maintainer should know about actual uses of floating point random variates". Cheers, Christophe 
From: Waldek Hebisch <hebisch@ma...>  20120709 20:39:30

Christophe Rhodes wrote: > The issue at hand is whether the "extra" density of floats around 0 > should be used by the RNG. At first, it seems obvious that it should, > because well, why not? One paradigm for floating point operations is that operation is first performed exactly and after that the result is rounded to a representable float. Sometimes doing so may cost a lot of compute time and a lot of coding effort (when we require "correct rounding"), so IMHO there are reasons to use approximations with higher error. But for RNG in single precision correct rounding seem to be obtainable with reasonable effort. Given that values close to zero have extremally small probablity one can use approximations (say rounding 64bit random integer scaled to the unit interval) that are close enough but cheaper to compute. So using extra precision around zero is fits well with philosopy of floating point. > On the other hand, imagine a simple use of a RNG > to generate samples from a distribution using the CDF and a lookup > table: generate a float between 0 and 1 and transform according to the > inverse of the CDF. Ignoring for the moment the actual generation of > zeros, if the RNG exploits the wide range of floats around 0, the lower > tail of the distribution will be much, much more explored than the upper > tail, because the floating point resolution around 0 is far greater than > it is around 1. I am not sure what you mean by "more explored". If you mean that floats close to zero have higher absolute precision, then it is what usualy is expected: taking log of uniform distribution one wants to get exponential distribution. When RNG uses uniform absolute precision in the interval (0, 1) the result is distorded tail of exponential distibution. When RNG makes good use of extra absolute precision available close to 0 then tail of transformed distribution is much closer to the true exponential distibution. Of course, the user may do something stupid, like using log(1  x) with x unifor in (0,1) to generate exponential distibution. Then extra effort spent close to 0 is wasted. Given the above I think that RNG which makes use of extra precision around 0 is better than one which does not. OTOH there are many uses of random numbers, some are content with low quality random source but are highly sensitive to speed, while other absolutely need high quality. It is likely that user will want different speed/quality tradeoff then the one provided by the default implementation. So I am ready to use my own or third party code if needed.  Waldek Hebisch hebisch@... 
From: Jeffrey Cunningham <jeffrey@jk...>  20120709 23:06:11
Attachments:
Message as HTML

On 07/09/2012 01:07 PM, Waldek Hebisch wrote: > Of course, the user may do something stupid, like ... Real civil. Your a real asset to the help community. JC 
From: Paul Khuong <pvk@pv...>  20120710 16:34:21

In article <E1SoKEb000696PN@...>, Waldek Hebisch <hebisch@...> wrote: > Christophe Rhodes wrote: > > The issue at hand is whether the "extra" density of floats around 0 > > should be used by the RNG. At first, it seems obvious that it should, > > because well, why not? > > One paradigm for floating point operations is that operation is > first performed exactly and after that the result is rounded > to a representable float. According to this view, the probability of (random 1.0) returning 1.0 should be positive. In fact, it should be equal to the probability of the random float being < 2^{25}. I detect a certain tension with the spec here. > > On the other hand, imagine a simple use of a RNG > > to generate samples from a distribution using the CDF and a lookup > > table: generate a float between 0 and 1 and transform according to the > > inverse of the CDF. Ignoring for the moment the actual generation of > > zeros, if the RNG exploits the wide range of floats around 0, the lower > > tail of the distribution will be much, much more explored than the upper > > tail, because the floating point resolution around 0 is far greater than > > it is around 1. > > I am not sure what you mean by "more explored". Many more distinct values in the left tail than in the right one. > When RNG makes good use of extra > absolute precision available close to 0 then tail of transformed > distribution is much closer to the true exponential distibution. Ah, but what happens if I need my extra precision at the other end? Or what if I'm working with a symmetric PDF? Or, what if my uniform's lower bound isn't 0? > Of course, the user may do something stupid, like using log(1  x) > with x unifor in (0,1) to generate exponential distibution. Then > extra effort spent close to 0 is wasted. http://en.wikipedia.org/wiki/Antithetic_variates. It's not stupid, but *useful*; sophisticated, one might even say. > Given the above I think that RNG which makes use of extra precision > around 0 is better than one which does not. I like the current behaviour for two reasons: it's simple to explain and to reason about (we generate random fractions with a fixed denominator, and express them as floats), and it's the most common way to do it. Simplicity is important to me because, as I noted above, getting small values around 0 right isn't enough: the exact same problems, modulo trivial differences, still crop up. The gain in correctness of intuitive code are extremely partial, and contingent on avoiding intuitively equivalent reformulations. This plays in the second reason: AFAICT, it's by far the most common way to generate uniforms (either the U[1, 2)1 trick, or by taking exactlyrepresented integers *and scaling them by a reciprocal*). Any issue with this way of doing things is common and language independent, and workarounds are likely to be known. Intuitively, any solution would still be applicable when generating tiny values; realistically, I know better than to trust my intuition here. I could certainly believe that there are strange sideeffects on statistics when using these variates in stochastic simulations. Either way, someone who cares about that ought to be basing their experiments on welltested methods, and these methods will most likely have been tested with a fixedprecision uniform generator. Oh, extra data point: none of the test suite that I know of (diehard, dieharder, TestU01) attempts to detect that behaviour. Still, I should be back in Montreal very soon, and I'll try to ask some stochastic simulation people. Paul Khuong 
From: <m_mommer@ya...>  20120710 21:39:15

Hello, first things first: I do not think that this is terribly important. Most of the time, it makes no difference, but sometimes it does, and then it should be an improvement. But as others have already observed, this is likely minor most of the time. My original motivation for submitting a patch that generates random floats in this way was that I was playing with the technique, and knew people here care about those things. I thought that this was an opportunity to make SBCL stand out in yet another detail. I am not particularly bothered personally if this patch never makes it into SBCL. I am however sorry for the trouble that this patch has caused. That said, I want to point out that none of the arguments against it that I have seen so far makes sense to me from a purely technical point of view, and that I think it would not be good for anyone if such arguments remain unchallenged. So: * Anything that uses random floats and actually depends on the lower bits of small numbers not being random is most likely broken anyway. * I cannot currently see any technical merit neither in the simplicity nor in the symmetry arguments brought up so far. If anyone has an actual example where preserving these things matters from a technical point of view, I'd be very interested in seeing it. It is true that if you have a PDF that is symmetric around 0.5 and has support (0,1), then yes, with this patch and an inversion scheme you will get more distinct values near zero than near one. I fail to see why this might be a bug instead of a feature. The antithetic variables technique will continue working, btw. * Also, the additional computational cost in generating floats uniformly in [0,1) that have as many random bits as possible is less than 10% in terms of CPU time. The code is not large either. * This, however, ( (log (random 1.0d0))) produces better exponentially distributed random floats with the patch than without. Many more distinct numbers will be generated, sampling the distribution better. Writing ( (log ( 1.0d0 (random 1.0d0)))) gets the old behavior back. That's it. I will probably make a little library out of this patch, so that anyone interested can use it. Paul Khuong <pvk@...> writes: > Oh, extra data point: none of the test suite that I know of (diehard, > dieharder, TestU01) attempts to detect that behaviour. That is indeed an interesting data point. But again, that is not really an argument on technical merits. > Still, I should be back in Montreal very soon, and I'll try to ask > some stochastic simulation people. I'd be very interested in knowing what they tell you. Regards, and thanks, Mario 
From: Paul Khuong <pvk@pv...>  20120711 09:01:42

In article <87r4sj9tzi.fsf@...>, m_mommer@... (Mario S. Mommer) wrote: > My original motivation for submitting a patch that generates random > floats in this way was that I was playing with the technique, and knew > people here care about those things. I thought that this was an > opportunity to make SBCL stand out in yet another detail. [...] > That said, I want to point out that none of the arguments against it > that I have seen so far makes sense to me from a purely technical point > of view, and that I think it would not be good for anyone if such > arguments remain unchallenged. No, my views aren't backed by technical considerations as much as engineering ones: I do not think that standing out in a field that's not in our "core business" is a good thing. Going with a classic, popular implementation allows SBCL users to exploit all the resources already available on how to use such PRNGs. I suppose (hope) that Mathworks has the resources and expertise to support users with their improved PRNG. I don't think we do now, and certainly wouldn't expect us to consistently the wherewithal to do so in the future. I can only think of the I/O subsystem, which has what looks like very clever stuff in serveevent. We've mostly been hacking that out of the runtime the past couple years; the implementation is now much closer to the wellknown posix layer. > * Anything that uses random floats and actually depends on the lower > bits of small numbers not being random is most likely broken anyway. One might argue the same for anything that depends on all small FP values being generated. > * This, however, > > ( (log (random 1.0d0))) > > produces better exponentially distributed random floats with the patch > than without. I'd think it's even more correct to loop *in the exponential variate generator* than to depend on log of tiny values. Paul Khuong 
From: Lars Brinkhoff <lars@no...>  20120712 05:52:15

Fun fact: there are 127 times more IEEE singlefloats in [0,1) than in [1,2). 
From: Jeffrey Cunningham <jeffrey@jk...>  20120712 06:28:36
Attachments:
Message as HTML

On 07/11/2012 10:51 PM, Lars Brinkhoff wrote: > Fun fact: there are 127 times more IEEE singlefloats in [0,1) than in [1,2). > 127? Not 128? How very curious. 
From: Lars Brinkhoff <lars@no...>  20120712 08:34:55

Jeffrey wrote: > Lars wrote: > > Fun fact: there are 127 times more IEEE singlefloats in [0,1) > > than in [1,2). > > 127? Not 128? How very curious. I did a bruteforce exhaustive search of the entire space of 2^32 floats and found 2^30 in [0,1) and 2^23 in [1,2). But of course you can arrive at the same result by appling some reasoning. Consider the exponent in the binary format. All the floats in [1,2) have the same exponent: 127. In the [0,1) range, the exponent can take on the 127 distinct values 0126. 
From: <m_mommer@ya...>  20120712 16:16:54

Paul Khuong <pvk@...> writes: > I suppose (hope) that Mathworks has the resources and expertise to > support users with their improved PRNG. For all I can tell, the resources that go into this are zero. Nobody has a problem with it. And I'm around. I could in principle answer to questions from SBCL users should they arise. Also, the code snippet we are talking about is rather small, and hardly an unmaintainable pile of code. >> * Anything that uses random floats and actually depends on the lower >> bits of small numbers not being random is most likely broken anyway. > > One might argue the same for anything that depends on all small FP > values being generated. When you put it in *this* particular way, perhaps, because nobody has the time to wait that long. I assume you wanted to say something else. You probably wanted to say that "something that requires high precision random floats is probably broken anyway". I find this a limiting way of thinking. >> * This, however, >> >> ( (log (random 1.0d0))) >> >> produces better exponentially distributed random floats with the patch >> than without. > > I'd think it's even more correct to loop *in the exponential variate > generator* For an exponential perhaps, although I expect the improvement to be very minimal over the much simpler ( (log (random 1.0d0))) with the finer generation mechanism. > than to depend on log of tiny values. On a modern platform, there is nothing wrong with the log of a small fp number. Try e.g. (let ((c (* pi 1.0d300))) (/ ( c (exp (log c))) c)) Or, for single floats (let ((c (float (* pi 1.0d35) 1.0s0))) (/ ( c (exp (log c))) c)) 
From: Lars Brinkhoff <lars@no...>  20120712 16:28:54

m_mommer@... (Mario S. Mommer) writes: > Or, for single [sic] floats > > (let ((c (float (* pi 1.0d35) 1.0s0))) > (/ ( c (exp (log c))) c)) Ok then, just to make a point: Welcome to GNU CLISP 2.48 (20090728) <http://clisp.cons.org/>; [...] [1]> (let ((c (float (* pi 1.0d35) 1.0s0))) (/ ( c (exp (log c))) c)) 4.4433s4 
From: Jeffrey Cunningham <jeffrey@jk...>  20120709 16:45:40
Attachments:
Message as HTML

On 07/09/2012 07:56 AM, Christophe Rhodes wrote: > Jeffrey Cunningham <jeffrey@...> writes: > >> That is very flawed behavior in a random number generator you are >> describing. What is the status of your patch? > That (that the RNG was flawed) was my instinct too, initially, but I > think I've been convinced out of it: that instead, lots that relates to > floating point is not as obvious as it first seems. > > The issue at hand is whether the "extra" density of floats around 0 > should be used by the RNG. At first, it seems obvious that it should, > because well, why not? On the other hand, imagine a simple use of a RNG > to generate samples from a distribution using the CDF and a lookup > table: generate a float between 0 and 1 and transform according to the > inverse of the CDF. Ignoring for the moment the actual generation of > zeros, if the RNG exploits the wide range of floats around 0, the lower > tail of the distribution will be much, much more explored than the upper > tail, because the floating point resolution around 0 is far greater than > it is around 1. Isn't an "extra density of floats around 0"  by definition  a nonuniform distribution? I would think this is highly undesirable behavior in a uniform RNG and should be corrected. Regards, Jeff Cunningham 
From: Christophe Rhodes <csr21@ca...>  20120709 17:21:26

Jeffrey Cunningham <jeffrey@...> writes: > Isn't an "extra density of floats around 0"  by definition  a > nonuniform distribution? Not in the sense that I mean. Or possibly "yes", but welcome to the world of floating point: because floating point numbers are a discrete set, not a continuum, there's no such thing as a continuous uniform distribution using floats, only particular approximations. Briefly, a floating point number is represented by sign, mantissa and exponent. Sign is always +1 or 1. What remains is the mantissa, which is a representation of the bits after the decimal point in the binary number 1.xxxxxxxxxxxxxx..., and the exponent, which indicates the power to which 2 must be raised to get you the number you want. (I am eliding lots of details here; there are references where you can read them). So for example, 0.5 is represented as the sign/mantissa/exponent triple (1, 0, 1); 0.75 is (1, 10000000..._2, 1), 0.875 is (1, 11000000..._2, 1), and so on. The point here is that the density of representable floating points /changes/ as the exponent changes: there are as many machine floats between 0.25 and 0.5 as there are between 0.5 and 1, and this isn't some kind of transfinite sleight of hand, this is a finite, countable set. So, near zero, there are more possible floats than there are near 1. Obviously, the RNG compensates for that, by having the possible floats near zero be generated with lower probability than those near one. But the point is that there is more than one way of performing that compensation: the appropriate probability mass could be distributed evenly over all possible floats within an evenlyspaced region, or a single representative in a region could be selected as an archetype  and if so, which representative? SBCL's strategy for generating numbers between 0 and 1 isn't so utterly stupid as you seem to think; it makes one particular choice, by selecting floats between 1 and 2 (which does have a uniform density of representable floats), and then subtracting 1. This has the effect of emphasizing the probability of generating 0 compared with the floating point numbers which are in fact representable in the region of 0, but that effect has potential virtues too (such as preserving the basic symmetry of the region near 0 and near 1 in the conceptual uniform distribution). > I would think this is highly undesirable behavior in a uniform RNG and > should be corrected. Tell you what: please specify unambiguously, paying reference to the hardware representations, the behaviour of the RNG when given a singlefloat 1.0 argument, and justify why the specified behaviour is better than all other behaviours in all circumstances. Cheers, Christophe 
From: <m_mommer@ya...>  20120709 21:35:38

Christophe Rhodes <csr21@...> writes: > The issue at hand is whether the "extra" density of floats around 0 > should be used by the RNG. At first, it seems obvious that it should, > because well, why not? On the other hand, imagine a simple use of a RNG > to generate samples from a distribution using the CDF and a lookup > table: generate a float between 0 and 1 and transform according to the > inverse of the CDF. Ignoring for the moment the actual generation of > zeros, if the RNG exploits the wide range of floats around 0, the lower > tail of the distribution will be much, much more explored than the upper > tail, because the floating point resolution around 0 is far greater than > it is around 1. It explores everywhere at the highest available resolution. It doesn't explore "much, much more"  this must be a misunderstanding. With the patch, the distribution of the samples generated with an inversion method will resemble more faithfully the intended distribution. Why is this a disadvantage? > There are other issues, but this one was enough to convince me that I > didn't want to place too much emphasis on my own instinct, and that I'd > like to read "what every RNG maintainer should know about actual uses of > floating point random variates". Could you elaborate? I'm interested in hearing about those other issues. In particular, do you know of any code that would actually break with this patch? Regards, and thanks, Mario. 
From: Christophe Rhodes <csr21@ca...>  20120710 05:48:51

m_mommer@... (Mario S. Mommer) writes: >> There are other issues, but this one was enough to convince me that I >> didn't want to place too much emphasis on my own instinct, and that I'd >> like to read "what every RNG maintainer should know about actual uses of >> floating point random variates". > > Could you elaborate? I'm interested in hearing about those other issues. > > In particular, do you know of any code that would actually break with > this patch? Sorry, too strong: I blame wading through exam papers. I meant "There may be other issues". I don't know of code that would break under an RNG that exploits the full float resolution around zero; what I meant was that I was more uncertain about "correct" RNG behaviour. A brief survey of other language runtimes didn't suggest that the current RNG was wildly out of line; there's a variety of float range RNGs out there, and only relatively few use the higher float resolution near zero. Cheers, Christophe 
From: Lars Brinkhoff <lars@no...>  20120712 08:47:31

Christophe Rhodes <csr21@...> writes: > SBCL's strategy for generating numbers between 0 and 1 isn't so > utterly stupid as you seem to think; it makes one particular choice, > by selecting floats between 1 and 2 (which does have a uniform > density of representable floats), and then subtracting 1. I didn't check the implementation, so the following may be off the mark. There are only 2^23 floats between 1 and 2. One could argue that a random number generator should be able to generate 2^23 floats between 0.5 and 1, and (at least) 2^23 floats between 0 and 0.5. That is, one more bit of precision. 