From: Eric M. <e.m...@po...> - 2002-07-24 16:23:49
|
On Tuesday 23 July 2002 22:15, pa...@pf... wrote: > RandomArray got a "special" position as part of Numeric simply by > historical accident in being there first. I think in the conversion to > Numarray we will be able to remove such things from the "core" and make > more of a marketplace of equals for the "addons". As it is now there is > some implication that somehow one is "better" than the other, which is > unjustified either mathematically or in the sense of design. > > RNG's design is based on my experience with large codes needing many > independent streams. The mathematics is from a well-tested Cray algorit= hm. > I'm sure it could use fluffing up but a good case can be made for it. A famous quote from Linus is "Nice idea. Now show me the code." Perhaps a detailed example makes my problem clearer, because as it is now= , RNG and RandomArray2 are not orthogonal in design, in the sense that RNG'= s default seed is fixed and RandomArray's is automagical (clock), not reproducible and mathematically suspect, which I think is not good for th= e more naive Python user. Below I will give intended usage in a provocative way, but please don't take me too seriously (I know, I don't ;-) Let's say you have a master shell script that runs a neural net paradigm (size 20x20) 10 times, each time with the same parameters, to see if it's stable or chaotic, i.e. does not 'converge' c.q. outcome depends on initi= al values (it should not be chaotic, but this should always be checked). run10.sh tracelink.py 20 20 inputpat.dat > hippocamp01.out ... 8 more ... tracelink.py 20 20 inputpat.dat > hippocamp10.out tracelink.py ... import numarray, RandomArray2 _or_ RNG ... # Case 1: RandomArray2 # User uses default clock seed, which is the same # during 1 second (see my previous posting). # ignlgi(void)'s seeds 1234567890L,123456789L # are _not_ used (see com.c). RandomArray2.seed() # But if omitted, RandomArray2.py does it, too. ... calculations ... other program outcome _only_ if program runs > 1 second, ... otherwise the others will have the same result. # Case 2: RNG # A 'standard_generator =3D CreateGenerator(-1)' is automatically = done. # seed < 0 =3D=3D> Use the default initial seed value. # seed =3D 0 =3D=3D> Set a "random" value for the seed from sy= stem clock. # seed > 0 =3D=3D> Set seed directly (32 bits only). # Thus, the fixed seeds used are 0,0 (see Mixranf() in ranf.c). ... calculations ... all 10 programs have the same outcome when using ranf(), ... because it always starts the same seed, the sequence is always= : ... 0.58011364857958725, 0.95051273498076583, 0.78637142533060356 = etc. =20 The problem with RandomArray's seed is, that it is not truly random itsel= f. In it's current (time.time based) implementation it is linearly auto incrementing every second, and therefore suffers from auto-correlation. Moreover, in the above example, if 10 separate .py runs complete in 1 sec= ond they'll all have the same seed (and outcome). This is not what the user, if accustomed to clock seeding, would expect. But if the seed is different each time, a problem is that runs are not reproducible. Let's say that run hippocamp06.out produced some strange output: now unless the user saved the seed (with get_seed), it can never be reproduced. Therefore, I think RNG's design is better and should be applied to RandomArray2, too, because RandomArray2's seeding is flawed anyways. A user should be aware of proper seeding, agreed, and now will be: when doing multiple identical runs, the same (and thus reproducible) output will result and so the user is made aware of the fact that, as an example, he or she should seed or pickle it between runs. So my suggestion would be to re-implement RandomArray2.seed(x=3D0,y=3D0) as follows: if either the x or y seed: seed < 0 =3D=3D> Use the default initial seed value. seed =3D None =3D=3D> Set a "random" value for the seed from the s= ystem clock. seeds >=3D 0 =3D=3D> Set seed directly (32 bits only). and en-passant do a better job than clock-based seeding: ---cut--- def seed(x=3DNone,y=3DNone): """seed(x, y), set the seed using the integers x, y; ... """ if (x !=3D None and type (x) !=3D IntType) or (y !=3D None and type (y) !=3D IntType) : raise ArgumentError, "seed requires integer arguments (or None)." if x =3D=3D None or y =3D=3D None: import dev_random_device # uses /dev/random or equivalent x =3D dev_random_device.nextvalue() # egd.sf.net is a user spac= e y =3D dev_random_device.nextvalue() # alternative elif x < 0 or y < 0: x =3D 1234567890L y =3D 123456789L ranlib.set_seeds(x,y) ---cut--- But: I realize that this is different behavior from Python's standard random and whrandom, where no arg or None uses the clock. But, if that behavior is kept for RandomArray2 (and RNG should then be adapted, too) then I'd urge at least to use a better initial seed. In certain applications, e.g. generating session id's in crypto programs, non-predictability of initial seeds is crucial. But if you have a look at GPG's or OpenSSL's source for a PRNG (especially for Windows), it look= s like an art in itself. So perhaps RNG's 'clock code' should replace RandomArray2's: it uses microseconds (in gettimeofday), too, and thus wil= l not have the 1-second problem. Bye-bye, Eric --=20 Eric Maryniak <e.m...@po...> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. Just because you're not paranoid, that doesn't mean that they're not after you. |