## [Mingw-users] RAND_MAX still 16bit ?!

 [Mingw-users] RAND_MAX still 16bit ?! From: Nghia Ho - 2011-04-23 14:43:09 Attachments: Message as HTML ```Hi all, I came across an old post in 2006 about RAND_MAX being 16bit only. I did a quick printf("%d\n", RAND_MAX) and I get 32767. I'm using the mingw that came with the latest CodeBlock for Windows as of writing, gcc version 4.4.1 TDM-2. What's the story there? It's giving me a lot of problems when using random_shuffle() for large data because it doesn't shuffle properly. I thought I was going nuts! I wrote a test program to verify the problem: #include #include #include using namespace std; void PrintHistogram(vector &data) { int bins = 10; vector histogram(bins); float percentage = 0.1; // Sample a percentage and accumulate histogram for(int i=0; i < data.size()*percentage; i++) { int pos = bins*data[i]/data.size(); histogram[pos]++; } for(unsigned int i=0; i < histogram.size(); i++) { printf("%d - %d\n", i, histogram[i]); } } int main(int argc, char **argv) { vector data(100000); for(unsigned int i=0; i < data.size(); i++) { data[i] = i; } random_shuffle(data.begin(), data.end()); printf("\nRandom shuffle\n"); PrintHistogram(data); return 0; } The program generates 100,000 data points and assigns values from 0 to 99,999. It then sub-samples 10% of this data, which we expect to have a uniform distribution if we plot a histogram of 10 bins. On Windows 7 I get: Random shuffle 0 - 360 1 - 348 2 - 459 3 - 462 4 - 598 5 - 740 6 - 1045 7 - 1417 8 - 1974 9 - 2598 Not a uniform distriubtion at all ! But on Linux I get: Random shuffle 0 - 1033 1 - 1001 2 - 1009 3 - 1005 4 - 986 5 - 1030 6 - 942 7 - 965 8 - 1041 9 - 988 What we expect. So the question is, how do I get more than 16bit from rand() ? This seems like a serious flaw. Nghia ```

 [Mingw-users] RAND_MAX still 16bit ?! From: Nghia Ho - 2011-04-23 14:43:09 Attachments: Message as HTML ```Hi all, I came across an old post in 2006 about RAND_MAX being 16bit only. I did a quick printf("%d\n", RAND_MAX) and I get 32767. I'm using the mingw that came with the latest CodeBlock for Windows as of writing, gcc version 4.4.1 TDM-2. What's the story there? It's giving me a lot of problems when using random_shuffle() for large data because it doesn't shuffle properly. I thought I was going nuts! I wrote a test program to verify the problem: #include #include #include using namespace std; void PrintHistogram(vector &data) { int bins = 10; vector histogram(bins); float percentage = 0.1; // Sample a percentage and accumulate histogram for(int i=0; i < data.size()*percentage; i++) { int pos = bins*data[i]/data.size(); histogram[pos]++; } for(unsigned int i=0; i < histogram.size(); i++) { printf("%d - %d\n", i, histogram[i]); } } int main(int argc, char **argv) { vector data(100000); for(unsigned int i=0; i < data.size(); i++) { data[i] = i; } random_shuffle(data.begin(), data.end()); printf("\nRandom shuffle\n"); PrintHistogram(data); return 0; } The program generates 100,000 data points and assigns values from 0 to 99,999. It then sub-samples 10% of this data, which we expect to have a uniform distribution if we plot a histogram of 10 bins. On Windows 7 I get: Random shuffle 0 - 360 1 - 348 2 - 459 3 - 462 4 - 598 5 - 740 6 - 1045 7 - 1417 8 - 1974 9 - 2598 Not a uniform distriubtion at all ! But on Linux I get: Random shuffle 0 - 1033 1 - 1001 2 - 1009 3 - 1005 4 - 986 5 - 1030 6 - 942 7 - 965 8 - 1041 9 - 988 What we expect. So the question is, how do I get more than 16bit from rand() ? This seems like a serious flaw. Nghia ```
 Re: [Mingw-users] RAND_MAX still 16bit ?! From: Greg Chicares - 2011-04-23 16:58:01 ```On 2011-04-23 14:43Z, Nghia Ho wrote: > > I came across an old post in 2006 about RAND_MAX being 16bit only. I did a > quick printf("%d\n", RAND_MAX) and I get 32767. Functions like rand() are provided by the msvc runtime library. Its implementation of rand() is probably frozen in the 1980s for backward compatibility. For serious work, don't use rand() anyway. > I'm using the mingw that came > with the latest CodeBlock for Windows as of writing, gcc version 4.4.1 TDM-2. > What's the story there? An official gcc-4.5.2 release is available from mingw.org . > It's giving me a lot of problems when using > random_shuffle() for large data because it doesn't shuffle properly. You're using C++, so why not use one of the excellent random number generators from boost.org? ```
 Re: [Mingw-users] RAND_MAX still 16bit ?! From: Peter Rockett - 2011-04-23 17:13:29 ```On 23/04/11 17:57, Greg Chicares wrote: > On 2011-04-23 14:43Z, Nghia Ho wrote: >> I came across an old post in 2006 about RAND_MAX being 16bit only. I did a >> quick printf("%d\n", RAND_MAX) and I get 32767. > Functions like rand() are provided by the msvc runtime library. Its > implementation of rand() is probably frozen in the 1980s for backward > compatibility. For serious work, don't use rand() anyway. > >> I'm using the mingw that came >> with the latest CodeBlock for Windows as of writing, gcc version 4.4.1 TDM-2. >> What's the story there? > An official gcc-4.5.2 release is available from mingw.org . > >> It's giving me a lot of problems when using >> random_shuffle() for large data because it doesn't shuffle properly. > You're using C++, so why not use one of the excellent random number > generators from boost.org? > Or from GSL - easier to build than boost IMO + works with plain C. P. ```
 Re: [Mingw-users] RAND_MAX still 16bit ?! From: LRN - 2011-04-23 17:25:58 ```On 23.04.2011 21:13, Peter Rockett wrote: > On 23/04/11 17:57, Greg Chicares wrote: >> On 2011-04-23 14:43Z, Nghia Ho wrote: >>> It's giving me a lot of problems when using >>> random_shuffle() for large data because it doesn't shuffle properly. >> You're using C++, so why not use one of the excellent random number >> generators from boost.org? >> > Or from GSL - easier to build than boost IMO + works with plain C. > > P. > Or import RtlRandom() (2000 or later) or RtlRandomEx() (XP or later) from ntdll and use these. CeGenRandom() on CE (not sure about this one). ```
 Re: [Mingw-users] RAND_MAX still 16bit ?! From: Matt P. Dziubinski - 2011-04-26 21:54:34 ```On 4/23/2011 7:13 PM, Peter Rockett wrote: >> You're using C++, so why not use one of the excellent random number >> generators from boost.org? >> > Or from GSL - easier to build than boost IMO + works with plain C. There's nothing to build for Boost.Random, those are header-only libraries, simply #include what you need: http://www.boost.org/doc/libs/release/doc/html/boost_random.html While, AFAIR, you do have to build GSL. Best, Matt ```
 Re: [Mingw-users] RAND_MAX still 16bit ?! From: K. Frank - 2011-04-23 17:42:46 ```Hello Nghia! On Sat, Apr 23, 2011 at 10:43 AM, Nghia Ho wrote: > Hi all, > > I came across an old post in 2006 about RAND_MAX being 16bit only. I did a > quick printf("%d\n", RAND_MAX) and I get 32767. I'm using the mingw that > came with the latest CodeBlock for Windows as of writing, gcc version 4.4.1 > TDM-2. I see the same value, 32767 (= 2^15 - 1), for RAND_MAX on a later (4.5.2) mingw version of g++. (On a linux version of g++ 4.4.5 I get RAND_MAX of 2^32 - 1.) A couple of comments: First, for the library maintainers to provide an updated version of rand that increases RAND_MAX from 2^15 - 1 to, say, 2^31 - 1, they can't, for example, just change various 16-bit integers that appear in the algorithm to 32-bit integers -- they would have to develop a whole new algorithm. Second, most implementations of rand() are not great anyway (some in the past have been notoriously bad -- I'm not sure how the windows version ranks), so for work where the quality of results matters, you don't want to use rand(). > What's the story there? It's giving me a lot of problems when using > random_shuffle() for large data because it doesn't shuffle properly. I > thought I was going nuts! I wrote a test program to verify the problem: Well, you've found your problem: RAND_MAX is only 2^15 - 1. (And you get the expected result on linux, because there RAND_MAX is larger.) > ... > > The program generates 100,000 data points and assigns values from 0 to > 99,999. It then sub-samples 10% of this data, which we expect to have a > uniform distribution if we plot a histogram of 10 bins. > > On Windows 7 I get: > > Random shuffle > 0 - 360 Ouch. RAND_MAX is too small on windows. > ... > Not a uniform distriubtion at all ! But on Linux I get: > > Random shuffle > 0 - 1033 Ahh... RAND_MAX is better on linux. > ... > What we expect. > > So the question is, how do I get more than 16bit from rand() ? This seems > like a serious flaw. Well, in the example you gave, you're using c++, so follow Greg's suggestion of using a better, more modern random number generator from a c++ library. Greg suggested boost, which is great, but if you're not already using it, it's a little bit inconvenient because you have to deal with about 100 MB of boost bloat just to get a simple function like random numbers. It might be more convenient to use , which almost certainly came from boost anyway (and is supported by my copy of mingw g++). Or, you can turn on -std=c++0x, and use from the new standard (which is almost certainly the same as boost::random and tr1::random). If the use-case you actually care about is indeed random_shuffle, the random-shuffle algorithm takes a random number generator as an optional argument, so you can plug in a better, longer-period generator to get better results. You can either roll your own (not recommended, except as an educational exercise) or use one from . If you really need to use rand(), then you could build your own 32-bit generator to plug into random_shuffle by combining together two values from rand(), but this is not recommended except as an expedient, as it further degrades the quality of the random numbers. Pseudo-random numbers are a little bit subtle. If you need high-quality results, you need to use a high-quality random number generator, best professionally written (such as those in ), and know a little bit about what you are doing. If you want to tell us a little more about your actual use case, we can probably give you some pointers about safe ways to proceed. > Nghia Good luck. K. Frank ```
 Re: [Mingw-users] RAND_MAX still 16bit ?! From: Nghia Ho - 2011-04-24 02:03:59 ```----- Original Message ---- > From: K. Frank > To: MinGW Users List > Sent: Sun, 24 April, 2011 3:42:39 AM > Subject: Re: [Mingw-users] RAND_MAX still 16bit ?! > > Hello Nghia! > > On Sat, Apr 23, 2011 at 10:43 AM, Nghia Ho wrote: > > Hi all, > > > > I came across an old post in 2006 about RAND_MAX being 16bit only. I did a > > quick printf("%d\n", RAND_MAX) and I get 32767. I'm using the mingw that > > came with the latest CodeBlock for Windows as of writing, gcc version 4.4.1 > > TDM-2. > > I see the same value, 32767 (= 2^15 - 1), for RAND_MAX on a later (4.5.2) > mingw version of g++. (On a linux version of g++ 4.4.5 I get RAND_MAX of > 2^32 - 1.) > > A couple of comments: > > First, for the library maintainers to provide an updated version of rand > that increases RAND_MAX from 2^15 - 1 to, say, 2^31 - 1, they can't, > for example, just change various 16-bit integers that appear in the > algorithm to 32-bit integers -- they would have to develop a whole new > algorithm. > > Second, most implementations of rand() are not great anyway (some > in the past have been notoriously bad -- I'm not sure how the windows > version ranks), so for work where the quality of results matters, you > don't want to use rand(). > > > What's the story there? It's giving me a lot of problems when using > > random_shuffle() for large data because it doesn't shuffle properly. I > > thought I was going nuts! I wrote a test program to verify the problem: > > Well, you've found your problem: RAND_MAX is only 2^15 - 1. (And > you get the expected result on linux, because there RAND_MAX is > larger.) > > > ... > > > > The program generates 100,000 data points and assigns values from 0 to > > 99,999. It then sub-samples 10% of this data, which we expect to have a > > uniform distribution if we plot a histogram of 10 bins. > > > > On Windows 7 I get: > > > > Random shuffle > > 0 - 360 > > Ouch. RAND_MAX is too small on windows. > > > ... > > Not a uniform distriubtion at all ! But on Linux I get: > > > > Random shuffle > > 0 - 1033 > > Ahh... RAND_MAX is better on linux. > > > ... > > What we expect. > > > > So the question is, how do I get more than 16bit from rand() ? This seems > > like a serious flaw. > > Well, in the example you gave, you're using c++, so follow Greg's > suggestion of using a better, more modern random number generator > from a c++ library. Greg suggested boost, which is great, but if you're > not already using it, it's a little bit inconvenient because you have to deal > with about 100 MB of boost bloat just to get a simple function like random > numbers. > > It might be more convenient to use , which almost certainly > came from boost anyway (and is supported by my copy of mingw g++). > Or, you can turn on -std=c++0x, and use from the new standard > (which is almost certainly the same as boost::random and tr1::random). > > If the use-case you actually care about is indeed random_shuffle, the > random-shuffle algorithm takes a random number generator as an > optional argument, so you can plug in a better, longer-period generator > to get better results. You can either roll your own (not recommended, > except as an educational exercise) or use one from . > > If you really need to use rand(), then you could build your own 32-bit > generator to plug into random_shuffle by combining together two > values from rand(), but this is not recommended except as an > expedient, as it further degrades the quality of the random numbers. > > Pseudo-random numbers are a little bit subtle. If you need high-quality > results, you need to use a high-quality random number generator, best > professionally written (such as those in ), and know a little bit > about what you are doing. > > If you want to tell us a little more about your actual use case, we can > probably give you some pointers about safe ways to proceed. > > > Nghia > > Good luck. > > > K. Frank Thank you all for the tip. Looks like is the easiest way to go! ```