From: Charles R H. <cha...@gm...> - 2006-06-04 18:36:26
|
Hi All, But mostly Robert. I've been fooling around timing random number generators and noticed that on an Athlon64 with 64bit binaries that the MWC8222 rng is about 2.5x as fast as the MT19937 generator. On my machine (1.8 GHz) I get MWC8222: long 2.58e+08 float 1.20e+08 double 1.34e+08 full double 1.02e+08 MT19937: long 9.07e+07 float 6.33e+07 double 6.71e+07 full double 3.81e+07 numbers/sec, where the time includes accumulating the sums. This also impacts the generation of normally distributed numbers MWC8222: nums/sec: 1.12e+08 average : 1.91e-05 sigma : 1.00e-00 MT19937: nums/sec: 5.41e+07 average : -9.73e-05 sigma : 1.00e+00 The times for 32 bit binaries is roughly the same. For generating large arrays of random numbers on 64 bit architectures it looks like MWC8222 is a winner. So, the question is, is there a good way to make the rng selectable? Chuck |
From: Stephan T. <st...@si...> - 2006-06-04 20:21:25
|
> MWC8222: > > nums/sec: 1.12e+08 > > MT19937: > > nums/sec: 5.41e+07 > The times for 32 bit binaries is roughly the same. For generating large > arrays of random numbers on 64 bit architectures it looks like MWC8222 > is a winner. So, the question is, is there a good way to make the rng > selectable? Although there are in general good reasons for having more than one random number generator available (and testing one's code with more than one generator), performance shouldn't be the deciding concern for selecting one. The most important characteristic of a random number generator are its distributional properties, e.g. how "uniform" and "random" its generated numbers are. There's hardly any generator which is faster than the Mersenne Twister _and_ has a better equi-distribution. Actually, the MT is so fast that on modern processors the contribution of the uniform number generator to most non-trivial simulation code is negligible. See www.iro.umontreal.ca/~lecuyer/ for good (mathematical) surveys on this topic. If you really need that last inch of performance, you should seriously think about outsourcing your inner simulation loop to C(++). And by the way, there's a good chance that making the rng selectable has a negative performance impact on random number generation (at least if the generation is done through the same interface and the current implementation is sufficiently optimized). Regards, Stephan |
From: Charles R H. <cha...@gm...> - 2006-06-04 20:41:17
|
Stephen, MWC8222 has good distribution properties, it comes from George Marsaglia and passes all the tests in the Diehard suite. It was also used among others by Jurgen Doornik in his investigation of the ziggurat method for random normals and he didn't turn up any anomalies. Now, I rather like theory behind MT19937, based as it is on an irreducible polynomial over Z_2 discovered by brute force search, but it is not the end all and be all of rng's. And yes, I do like to generate hundreds of millions of random numbers/sec, and yes, I do do it in c++ and use boost/python as an interface, but that doesn't mean numpy can't use a speed up now and then. In particular, the ziggurat method for generating normals is also significantly faster than the polar method in numpy. Put them together and on X86_64 I think you will get close to a factor of ten improvement in speed. That isn't to be sniffed at, especially if you are simulating noisy images and such. On 6/4/06, Stephan Tolksdorf <st...@si...> wrote: > > > > MWC8222: > > > > nums/sec: 1.12e+08 > > > > MT19937: > > > > nums/sec: 5.41e+07 > > The times for 32 bit binaries is roughly the same. For generating large > > arrays of random numbers on 64 bit architectures it looks like MWC8222 > > is a winner. So, the question is, is there a good way to make the rng > > selectable? > > Although there are in general good reasons for having more than one > random number generator available (and testing one's code with more than > one generator), performance shouldn't be the deciding concern for > selecting one. The most important characteristic of a random number > generator are its distributional properties, e.g. how "uniform" and > "random" its generated numbers are. There's hardly any generator which > is faster than the Mersenne Twister _and_ has a better > equi-distribution. Actually, the MT is so fast that on modern processors > the contribution of the uniform number generator to most non-trivial > simulation code is negligible. See www.iro.umontreal.ca/~lecuyer/ for > good (mathematical) surveys on this topic. > > If you really need that last inch of performance, you should seriously > think about outsourcing your inner simulation loop to C(++). And by the > way, there's a good chance that making the rng selectable has a negative > performance impact on random number generation (at least if the > generation is done through the same interface and the current > implementation is sufficiently optimized). > > Regards, > Stephan > |
From: Robert K. <rob...@gm...> - 2006-06-04 22:04:49
|
Charles R Harris wrote: > For generating large > arrays of random numbers on 64 bit architectures it looks like MWC8222 > is a winner. So, the question is, is there a good way to make the rng > selectable? Sure! All of the distributions ultimately depend on the uniform generators (rk_random, rk_double, etc.). It would be possible to alter the rk_state struct to store data for multiple generators (probably through a union) and store function pointers to the uniform generators. The public API rk_random, rk_double, etc. would be modified to call the function pointers to the private API functions depending on the actual generator chosen. At the Pyrex level, some modifications would need to be made to the RandomState constructor (or we would need to make alternate constructors) and the seeding methods. Nothing too bad. I don't think it would be worthwhile to change the numpy.random.* functions that alias the methods on the default RandomState object. Code that needs customizable PRNGs should be taking a RandomState object instead of relying on the function-alike aliases. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Robert K. <rob...@gm...> - 2006-06-04 22:10:11
|
Robert Kern wrote: > Charles R Harris wrote: > >>For generating large >>arrays of random numbers on 64 bit architectures it looks like MWC8222 >>is a winner. So, the question is, is there a good way to make the rng >>selectable? > > Sure! I should also add that I have no time to do any of this, but I'll be happy to answer questions and make suggestions if you would like to tackle this. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Charles R H. <cha...@gm...> - 2006-06-04 22:38:00
|
On 6/4/06, Robert Kern <rob...@gm...> wrote: > > Charles R Harris wrote: > > For generating large > > arrays of random numbers on 64 bit architectures it looks like MWC8222 > > is a winner. So, the question is, is there a good way to make the rng > > selectable? > > Sure! All of the distributions ultimately depend on the uniform generators > (rk_random, rk_double, etc.). It would be possible to alter the rk_state > struct > to store data for multiple generators (probably through a union) and store > function pointers to the uniform generators. The public API rk_random, > rk_double, etc. would be modified to call the function pointers to the > private > API functions depending on the actual generator chosen. > > At the Pyrex level, some modifications would need to be made to the > RandomState > constructor (or we would need to make alternate constructors) and the > seeding > methods. Heh, I borrowed some seeding methods from numpy, but put them in their own file with interfaces void fillFromPool(uint32_t *state, size_t size); void fillFromSeed(uint32_t *state, size_t size, uint32_t seed); void fillFromVect(uint32_t *state, size_t size, const std::vector<uint32_t> & seed); So that I could use them more generally. I left out the method using the system time because, well, everything I am interested in runs on linux or windows. Boost has a good include file, boost/cstdint.hpp, that deals with all the issues of defining integer types on different platforms. I didn't use it, though, just the stdint.h file ;) Nothing too bad. I don't think it would be worthwhile to change the > numpy.random.* functions that alias the methods on the default RandomState > object. Code that needs customizable PRNGs should be taking a RandomState > object > instead of relying on the function-alike aliases. I'll take a look, though like you I am pretty busy these days. -- > Robert Kern Chuck |