You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: Charles R H. <cha...@gm...> - 2006-09-04 21:53:32
|
On 9/4/06, Charles R Harris <cha...@gm...> wrote: > > > > On 9/4/06, Sebastian Haase <ha...@ms...> wrote: > > > > Paulo J. S. Silva wrote: > > > Once again, the information that singed zero is part of IEEE standard > > is > > > in the paper I cited in my last message. > > > > > > It is very important to be able to compute the sign of an overflowed > > > quantity in expressions like 1/x when x goes to zero. > > > > > > Best, > > > > > > Paulo > > Hi, > > > > This is all very interesting ( and confusing (to me, maybe others) at > > the same time ...) ... > > > > Question 0: Are you sure this is not a bug ? > > >>> N.array([66]).round(-1) > > [60] > > > That does look like a bug. > > >>> array([66]).round(-1) > array([60]) > >>> array([66.]).round(-1) > array([ 70.]) > > I suspect it is related to the integer data type of the first example. The > code probably does something like this: > > round(66/10)*10 > > and 66/10 == 6 in integer arithmetic. > The problem is somewhere in this bit of code: // f will be a double f = PyFloat_FromDouble(power_of_ten(decimals)); if (f==NULL) return NULL; // op1 will be division ret = PyObject_CallFunction(op1, "OOO", a, f, out); if (ret==NULL) goto finish; // round in place. tmp = PyObject_CallFunction(n_ops.rint, "OO", ret, ret); if (tmp == NULL) {Py_DECREF(ret); ret=NULL; goto finish;} I suspect the problem is the division, which has integer 'a', double 'f', and integer 'out'. Integer out is probably the killer. Let's see: # use doubles for output >>> out = zeros(1) >>> array([66]).round(decimals=-1, out=out) array([ 70.]) yep, that's the problem Chuck. <snip> |
From: <rw6...@sn...> - 2006-09-04 21:27:53
|
Forgive me if I missed any replies. Since I have seen none, I will rephrase the request. Please demonstrate an irregular array in numpy with time complexity measurement. The shape does not matter so long as it is non-rectangular and includes a complexity measurement. A sparse matrix is conceptually rectangular, so it does not fit the request at all. The question is whether numpy has such support; if not, is it planned. The question is not about the general-purpose Python language. (Although Python demonstrations are of interest if they include the time complexity measure from first principles or experiment.) |
From: David M. C. <co...@ph...> - 2006-09-04 20:29:09
|
On Mon, 4 Sep 2006 01:53:40 -0400 "A. M. Archibald" <per...@gm...> wrote: > > A better question to ask is, "Can I change numpy's rounding behaviour > for my programs?" (And, more generally, "can I set all the various > floating-point options that the IEEE standard and my processor both > support?") I don't know the answer to that one, but it does seem to be > a goal that numpy is trying for (hence, for example, the difference > between numpy float scalars and python floats). (looks) I think we've missed rounding. And the inexact flag. You can control how overflow, underflow, divide-by-0, and invalid-operand are handled using geterr and seterr, though. Unfortunately, getting/setting the rounding mode is hard to do in a platform-independent way :( gcc has fegetround() and fesetround() (they're part of the C99 standard, I believe). I don't know what to use on other machines (esp. Windows with MSVC), although the Boost Interval library looks like a good place to start: http://boost.cvs.sourceforge.net/boost/boost/boost/numeric/interval/detail/ -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |co...@ph... |
From: Paulo J. S. S. <pjs...@im...> - 2006-09-04 19:41:26
|
Ops, I believe you were caught by int versus float here: In [16]:around(66.0, decimals=-1) Out[16]:70.0 In [17]:around(66, decimals=-1) Out[17]:60 Note that in the int case, it seems like round is simply truncation. Paulo |
From: Charles R H. <cha...@gm...> - 2006-09-04 19:40:08
|
On 9/4/06, Sebastian Haase <ha...@ms...> wrote: > > Paulo J. S. Silva wrote: > > Once again, the information that singed zero is part of IEEE standard is > > in the paper I cited in my last message. > > > > It is very important to be able to compute the sign of an overflowed > > quantity in expressions like 1/x when x goes to zero. > > > > Best, > > > > Paulo > Hi, > > This is all very interesting ( and confusing (to me, maybe others) at > the same time ...) ... > > Question 0: Are you sure this is not a bug ? > >>> N.array([66]).round(-1) > [60] That does look like a bug. >>> array([66]).round(-1) array([60]) >>> array([66.]).round(-1) array([ 70.]) I suspect it is related to the integer data type of the first example. The code probably does something like this: round(66/10)*10 and 66/10 == 6 in integer arithmetic. Chuck >>> N.array([66.2]).round(-1) > [ 70.] > > Question 1: Was this way of round already in Numeric and /or in numarray ? Don't recall. Question 2: Does this need to be better documented (complete and > corrected(!) docstrings - maybe a dedicated wiki page ) !? > This is related to "How does Matlab or IDL or others do rounding ?" - > This would at least determine how many people would be surprised by this. > (I would say that *if* Matlab rounds the same way, we might need less > documentation ...) I have improved the document strings and the example at scipy.org. Chuck |
From: Fernando P. <fpe...@gm...> - 2006-09-04 19:18:51
|
On 9/3/06, Robert Kern <rob...@gm...> wrote: > > I think it's propably a bug: > > > > >>> concatenate((array([]),b)) > > array([None, None], dtype=object) > > Well, if you can fix it without breaking anything else, then it's a bug. > > However, I would suggest that a rule of thumb for using object arrays is to > always be explicit. Never rely on automatic conversion from Python containers to > object arrays. Since Python containers are also objects, it is usually ambiguous > what the user meant. > > I kind of liked numarray's choice to move the object array into a separate > constructor. I think that gave them some flexibility to choose different syntax > and semantics from the generic array() constructor. Since constructing object > arrays is so different from constructing numeric arrays, I think that difference > is warranted. This is something that should probably be sorted out before 1.0 is out. IMHO, the current behavior is a bit too full of subtle pitfalls to be a good long-term solution, and I think that predictability trumps convenience for a good API. I know that N.array() is already very complex; perhaps the idea of moving all object array construction into a separate function would be a long-term win. I think that object arrays are actually very important for numpy: they provide the bridge between pure numerical, Fortran-like computing and the richer world of Python datatypes and complex objects. But it's important to acknowledge this bridge character: they connect into a world where the basic assumptions of homogeneity of numpy arrays don't apply anymore. I'd be +1 on forcing this acknowledgement by having a separate N.oarray constructor, accessible via the dtype flag to N.array as well, but /without/ N.array trying to invoke it automatically by guessing the contents of what it was fed. The downside of this approach is that much of the code that 'magically' just works with N.array(foo) today, would now break. I'm becoming almost of the opinion that the code is broken already, it just hasn't failed yet :) Over time, I've become more and more paranoid of what constructors (and factory-type functions that build objects) do with their inputs, how strongly they validate them, and how explicit they require their users to be about their intent. While I'm a big fan of Python's duck-typing, I've also learned (the hard way) that in larger codebases, the place to be very strict about input validation is object constructors. It's easy to let garbage seep into an object by not validating a constructor input, and to later (often MUCH later) have your code bizarrely explode with an unrecognizable exception, because that little piece of garbage was fed to some third-party code which was expecting something else. If I've understood history correctly, some of the motivations behind Enthought's Traits are of a similar nature. For now I've dealt with our private problem that spurred my original posting. But I think this issue is worth clarifying for numpy, before 1.0 paints us into a backwards-compatibility corner with a fairly fundamental datatype and constructor. Regards, f |
From: Sebastian H. <ha...@ms...> - 2006-09-04 19:04:04
|
Paulo J. S. Silva wrote: > Once again, the information that singed zero is part of IEEE standard is > in the paper I cited in my last message. > > It is very important to be able to compute the sign of an overflowed > quantity in expressions like 1/x when x goes to zero. > > Best, > > Paulo Hi, This is all very interesting ( and confusing (to me, maybe others) at the same time ...) ... Question 0: Are you sure this is not a bug ? >>> N.array([66]).round(-1) [60] >>> N.array([66.2]).round(-1) [ 70.] Question 1: Was this way of round already in Numeric and /or in numarray ? Question 2: Does this need to be better documented (complete and corrected(!) docstrings - maybe a dedicated wiki page ) !? This is related to "How does Matlab or IDL or others do rounding ?" - This would at least determine how many people would be surprised by this. (I would say that *if* Matlab rounds the same way, we might need less documentation ...) Thanks, Sebastian Haase |
From: A. M. A. <per...@gm...> - 2006-09-04 18:17:25
|
On 04/09/06, Charles R Harris <cha...@gm...> wrote: > > > > On 9/4/06, A. M. Archibald <per...@gm...> wrote: > > On 04/09/06, Robert Kern <rob...@gm...> wrote: > > > Charles R Harris wrote: > > However, a random number generator based on a stream cipher is > > probably going to be pretty good, if slow, so implementingone if those > > can't hurt. But I think leaving the possibility open that one could > > use custom sources of random bytes is a very good idea. There's no > > particular need that custom ones should be fast, as they're for use > > when you really want *good* random numbers. > > > I was thinking it might be nice to have one, just to generate seeds if > nothing else. This could actually be written in python and use something > like the python cryptography toolkit, no need to reinvent the wheel. It > might be worth looking at that code just to see how someone else thought > through the interface. It's under the Python license, so I need to know if > that license is compatible with the BSD license used for numpy. Of the generators they have, only their random pool is of any use for seeding, and even that needs some clever feeding of random data. However, on practically any platofrm this will run on, random bytes based on real entropy can be extracted from the system (/dev/random, windows crypto API, something on the Mac); a uniform API for those would be a handy thing to have. But I can't recommend spending much effort connecting the python crypto stuff to numpy's random number generators; it's the sort of thing application writers can easily cook up by subclassing RandomGenerator (or whatever). A. M. Archibald |
From: Paulo J. S. S. <pjs...@im...> - 2006-09-04 17:07:17
|
Once again, the information that singed zero is part of IEEE standard is in the paper I cited in my last message. It is very important to be able to compute the sign of an overflowed quantity in expressions like 1/x when x goes to zero. Best, Paulo |
From: Charles R H. <cha...@gm...> - 2006-09-04 16:32:55
|
On 9/4/06, Paulo J. S. Silva <pjs...@im...> wrote: > > Interesting, I was just reading about the round rule in IEEE standard > last Friday. > > What numpy's "around" function does is called "round to even" (round is > take to make the next digit even), instead of "round up". According to > "What every computer scientist should know about floating-point > arithmetic" (do a Google search), the reason to prefer "round to even" > is exemplified by the following result from Reiser and Knuth: > > Theorem > > Let x and y be floating-point numbers, and define x0 = x, x1 = (x0 - y) > + y, ..., xn = (xn-1 - y) + y. If + and - are exactly rounded using > round to even, then either xn = x for all n or xn = x1 for all n >= 1. > > If you use "round up" the sequence xn can start increasing slowly "for > ever", and as the paper says: > > "...This example suggests that when using the round up rule, > computations can gradually drift upward, whereas when using round to > even the theorem says this cannot happen." > > Best, > > Paulo However, around does set the sign bit when rounding -.5 to zero: >>> around(-.5).tostring() '\x00\x00\x00\x00\x00\x00\x00\x80' so that >>> around(-.5) -0.0 on the other hand >>> around(-.5) == 0.0 True I didn't know -0.0 was part of the IEEE spec. Chuck |
From: Charles R H. <cha...@gm...> - 2006-09-04 15:30:10
|
On 9/4/06, A. M. Archibald <per...@gm...> wrote: > > On 04/09/06, Robert Kern <rob...@gm...> wrote: > > Charles R Harris wrote: > > > > > What sort of api should this be? It occurs to me that there are > already > > > 4 sources of random bytes: > > > > > > Initialization: > > > > > > /dev/random (pseudo random, I think) > > /dev/random is (supposed to be) true randomness derived from various > sources. It's also fed through a bunch of hashing and CRC functions > chosen for convenience, but is probably pretty unbiased. > > > > /dev/urandom > > > crypto system on windows > > > > > > Pseudo random generators: > > > > > > mtrand > > > > > > I suppose we could add some cryptologically secure source as well. > That > > > indicates to me that one set of random number generators would just be > > > streams of random bytes, possibly in 4 byte chunks. If I were doing > this > > > for linux these would all look like file systems, FUSE comes to mind. > > > Another set of functions would transform these into the different > > > distributions. So, how much should stay in numpy? What sort of API are > > > folks asking for? > > "Cryptographically secure random number generator" means something > different to everybody, and implementing one that everyone is happy > with is going to be a colossal pain. Look at Yarrow and its design > documents for some idea of the complexities involved. > > However, a random number generator based on a stream cipher is > probably going to be pretty good, if slow, so implementingone if those > can't hurt. But I think leaving the possibility open that one could > use custom sources of random bytes is a very good idea. There's no > particular need that custom ones should be fast, as they're for use > when you really want *good* random numbers. I was thinking it might be nice to have one, just to generate seeds if nothing else. This could actually be written in python and use something like the python cryptography toolkit<http://www.amk.ca/python/writing/pycrypt/pycrypt.html>, no need to reinvent the wheel. It might be worth looking at that code just to see how someone else thought through the interface. It's under the Python license, so I need to know if that license is compatible with the BSD license used for numpy. > It would be good to also support quasi-random generators like Sobol and > Halton > > sequences. They tend to only generate floating point numbers in [0, 1] > (or (0, > > 1) or [0, 1) or (0, 1]; I think it varies). There are probably other > PRNGs that > > only output floating point numbers, but I doubt we want to implement any > of > > them. You can look at the 1.6 version of RandomKit for an implementation > of > > Sobol sequences and ISAAC, a cryptographic PRNG. > > > > http://www.jeannot.org/~js/code/index.en.html > > Sobol' sequences may require special care in constructing non-uniform > distributions in order to preserve their uniformity properties. If we > can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES > in counter mode than some custom cryptosystem that hasn't been > analyzed as thoroughly. > > > I'm thinking that the core struct that we're going to pass around will > look > > something like this: > > > > struct random_state { > > void* state; > > unsigned int (*gen_uint32)(struct random_state*); > > double (*gen_double)(struct random_state*); > > } > > > > state -- A pointer to the generator-specific struct. > > > > gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. > Possibly > > NULL if the generator can not implement such a generator. > > > > gen_double -- A function pointer that yields a double-precision number > in [0, 1] > > (possibly omitting one or both of the endpoints depending on the > implementation). > > Are 32-bit numbers really the right least common denominator? There > are plenty of 64-bit platforms out there... MWC8222 runs faster (2x) on some 64 bit platforms because it multiplies two 32 bit numbers and uses the 64 bit product. The mtrand generator really uses a polynomial with coefficients in z_2 and has 624 words worth, an even number, so could probably be adapted to run directly with 64 bit registers although the gain might not be much. Most other generators I know of produce 32 bit numbers. Not that I am an expert on the subject. Given this API, implementing a subclassable class that exports it > should satisfy most people's needs for interchangeable generators. A. M. Archibald Chuck |
From: Paulo J. S. S. <pjs...@im...> - 2006-09-04 15:08:42
|
Interesting, I was just reading about the round rule in IEEE standard last Friday. What numpy's "around" function does is called "round to even" (round is take to make the next digit even), instead of "round up". According to "What every computer scientist should know about floating-point arithmetic" (do a Google search), the reason to prefer "round to even" is exemplified by the following result from Reiser and Knuth: Theorem Let x and y be floating-point numbers, and define x0 = x, x1 = (x0 - y) + y, ..., xn = (xn-1 - y) + y. If + and - are exactly rounded using round to even, then either xn = x for all n or xn = x1 for all n >= 1. If you use "round up" the sequence xn can start increasing slowly "for ever", and as the paper says: "...This example suggests that when using the round up rule, computations can gradually drift upward, whereas when using round to even the theorem says this cannot happen." Best, Paulo |
From: A. M. A. <per...@gm...> - 2006-09-04 14:34:42
|
On 04/09/06, Robert Kern <rob...@gm...> wrote: > Charles R Harris wrote: > > > What sort of api should this be? It occurs to me that there are already > > 4 sources of random bytes: > > > > Initialization: > > > > /dev/random (pseudo random, I think) /dev/random is (supposed to be) true randomness derived from various sources. It's also fed through a bunch of hashing and CRC functions chosen for convenience, but is probably pretty unbiased. > > /dev/urandom > > crypto system on windows > > > > Pseudo random generators: > > > > mtrand > > > > I suppose we could add some cryptologically secure source as well. That > > indicates to me that one set of random number generators would just be > > streams of random bytes, possibly in 4 byte chunks. If I were doing this > > for linux these would all look like file systems, FUSE comes to mind. > > Another set of functions would transform these into the different > > distributions. So, how much should stay in numpy? What sort of API are > > folks asking for? "Cryptographically secure random number generator" means something different to everybody, and implementing one that everyone is happy with is going to be a colossal pain. Look at Yarrow and its design documents for some idea of the complexities involved. However, a random number generator based on a stream cipher is probably going to be pretty good, if slow, so implementingone if those can't hurt. But I think leaving the possibility open that one could use custom sources of random bytes is a very good idea. There's no particular need that custom ones should be fast, as they're for use when you really want *good* random numbers. > It would be good to also support quasi-random generators like Sobol and Halton > sequences. They tend to only generate floating point numbers in [0, 1] (or (0, > 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that > only output floating point numbers, but I doubt we want to implement any of > them. You can look at the 1.6 version of RandomKit for an implementation of > Sobol sequences and ISAAC, a cryptographic PRNG. > > http://www.jeannot.org/~js/code/index.en.html Sobol' sequences may require special care in constructing non-uniform distributions in order to preserve their uniformity properties. If we can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES in counter mode than some custom cryptosystem that hasn't been analyzed as thoroughly. > I'm thinking that the core struct that we're going to pass around will look > something like this: > > struct random_state { > void* state; > unsigned int (*gen_uint32)(struct random_state*); > double (*gen_double)(struct random_state*); > } > > state -- A pointer to the generator-specific struct. > > gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly > NULL if the generator can not implement such a generator. > > gen_double -- A function pointer that yields a double-precision number in [0, 1] > (possibly omitting one or both of the endpoints depending on the implementation). Are 32-bit numbers really the right least common denominator? There are plenty of 64-bit platforms out there... Given this API, implementing a subclassable class that exports it should satisfy most people's needs for interchangeable generators. A. M. Archibald |
From: Robert K. <rob...@gm...> - 2006-09-04 08:31:35
|
Charles R Harris wrote: > What sort of api should this be? It occurs to me that there are already > 4 sources of random bytes: > > Initialization: > > /dev/random (pseudo random, I think) > /dev/urandom > crypto system on windows > > Pseudo random generators: > > mtrand > > I suppose we could add some cryptologically secure source as well. That > indicates to me that one set of random number generators would just be > streams of random bytes, possibly in 4 byte chunks. If I were doing this > for linux these would all look like file systems, FUSE comes to mind. > Another set of functions would transform these into the different > distributions. So, how much should stay in numpy? What sort of API are > folks asking for? It would be good to also support quasi-random generators like Sobol and Halton sequences. They tend to only generate floating point numbers in [0, 1] (or (0, 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that only output floating point numbers, but I doubt we want to implement any of them. You can look at the 1.6 version of RandomKit for an implementation of Sobol sequences and ISAAC, a cryptographic PRNG. http://www.jeannot.org/~js/code/index.en.html I'm thinking that the core struct that we're going to pass around will look something like this: struct random_state { void* state; unsigned int (*gen_uint32)(struct random_state*); double (*gen_double)(struct random_state*); } state -- A pointer to the generator-specific struct. gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly NULL if the generator can not implement such a generator. gen_double -- A function pointer that yields a double-precision number in [0, 1] (possibly omitting one or both of the endpoints depending on the implementation). Possibly this struct needs to be expanded to include function pointers for destroying the state struct and for a copying the state to a new object. The seeding function(s) probably don't need to travel in the struct. We should determine if C++ will work for us, here. Unfortunately, that will force all extension modules that want to use this API to be C++ modules. One other feature of the current implementation of state struct is that a few of the distributions cache information between calls. Notably, the Box-Müller Gaussian distribution algorithm generates two normal deviates at a time; one is returned, the other is cached until the next call. The binomial distribution computes some intermediate values that are reused if the next call has the same parameters. We should probably store such things separately from the core state struct this time. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Charles R H. <cha...@gm...> - 2006-09-04 06:42:28
|
On 9/3/06, Robert Kern <rob...@gm...> wrote: > > Charles R Harris wrote: > > Hi Robert, > > > > I am about to get started on some stuff for the random number generators > > but thought I would run it by you first. I envisage the following: > > > > uniform short_doubles -- doubles generated from a single 32 bit random > > number (advantage: speed) > > uniform double, short_doubles on the interval (0,1) -- don't touch > > singularities in functions like log (this is my preferred default) > > fast_normal -- ziggurat method using single 32 bit random numbers > > (advantage: speed) > > fast_exponential -- ziggurat method using single 32 bit random numbers > > (advantage: speed) > > MWC8222 random number generator (advantage: speed on some machines, > > different from mtrand) > > > > Except for the last, none conflict with current routines and can be > > added without a branch. I expect adding MWC8222 might need more > > extensive work and I will branch for that. So the questions are of > > utility and naming. I see some utility for myself, otherwise I wouldn't > > be considering doing the work. OTOH, I already have (C++) routines that > > I use for these things, so a larger question might be if anyone else > > sees a use for these. I like speed, but it is not always that important > > in everyday apps. > > I would prefer not to expand the API of numpy.random. If it weren't > necessary > for numpy to provide all of the capabilities that came with Numeric's > RandomArray, I wouldn't want numpy.random in there at all. Yes, good point. Now, a very productive course of action would be to refactor numpy.randomsuch > that the distributions (the first four items on your list fall into this > category) and the underlying PRNG (the fifth) are separated from one > another > such that they can be mixed and matched at runtime. A byproduct of this > would > expose the C API of both of these in order to be usable by other C > extension > modules, something that's been asked for about a dozen times now. The five > items > on your list could be implemented in an extension module distributed in > scipy. What sort of api should this be? It occurs to me that there are already 4 sources of random bytes: Initialization: /dev/random (pseudo random, I think) /dev/urandom crypto system on windows Pseudo random generators: mtrand I suppose we could add some cryptologically secure source as well. That indicates to me that one set of random number generators would just be streams of random bytes, possibly in 4 byte chunks. If I were doing this for linux these would all look like file systems, FUSE comes to mind. Another set of functions would transform these into the different distributions. So, how much should stay in numpy? What sort of API are folks asking for? > I see that Pyrex is used for the interface, so I suppose that is one > > more tool to become familiar with ;) > > Possibly not. Pyrex was getting in the way of exposing a C API the last > time I > took a stab at it. A possibility that just occurred to me is to make an > extension module that *only* exposes the C API and mtrand could be > rewritten to > use that API. Hmmm. I like that. Good, I can do without pyrex. I can give some guidance about how to proceed and help you navigate the > current > code, but I'm afraid I don't have much time to actually code. Thanks, that is all I ask. -- > Robert Kern Chuck |
From: A. M. A. <per...@gm...> - 2006-09-04 06:00:59
|
On 04/09/06, Robert Kern <rob...@gm...> wrote: > A. M. Archibald wrote: > > > In the same project I also noticed it would be nice to be able to > > (say) do "exponential(2+sin(arange(10)))" to get an array of > > exponentials with varying parameters. > > Travis O. recently added this capability. The distribution parameters (loc, > scale, alpha, whatever) are broadcast against each other using the same rules as > ufunc parameters. Wonderful! Thank you, Travis O. A. M. Archibald |
From: Robert K. <rob...@gm...> - 2006-09-04 05:55:12
|
A. M. Archibald wrote: > In the same project I also noticed it would be nice to be able to > (say) do "exponential(2+sin(arange(10)))" to get an array of > exponentials with varying parameters. Travis O. recently added this capability. The distribution parameters (loc, scale, alpha, whatever) are broadcast against each other using the same rules as ufunc parameters. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: A. M. A. <per...@gm...> - 2006-09-04 05:53:45
|
On 04/09/06, Sebastian Haase <ha...@ms...> wrote: > Thanks for the reply - but read what the doc says: > >>> N.around.__doc__ > 'Round 'a' to the given number of decimal places. Rounding > behaviour is equivalent to Python. > > Return 'a' if the array is not floating point. Round both the real > and imaginary parts separately if the array is complex. > ' > > it is *not* done this way in Python: > >>> round(.5) > 1.0 > >>> round(1.5) > 2.0 > >>> round(2.5) > 3.0 > > ( the round obj. method is missing this doc string ) Um, well, the doc is wrong. Numpy should *not* always follow python's lead, and in partciular it's explicit that Numeric's floating point is closer to IEEE floating-point behaviour than python's - compare, for example 1./0. and array(1.)/0. > I really think we should stick to what the doc string say - everybody > expects x.5 to round up !! Not everybody. This (admittedly odd) behaviour wasn't decided on because it was more efficient to implement, it was decided on because a group of very smart numerical analysts agreed that it was the best way to avoid surprising naive users with biased results. (Non-naive users can normally pick from any of several other rounding modes if they want.) A better question to ask is, "Can I change numpy's rounding behaviour for my programs?" (And, more generally, "can I set all the various floating-point options that the IEEE standard and my processor both support?") I don't know the answer to that one, but it does seem to be a goal that numpy is trying for (hence, for example, the difference between numpy float scalars and python floats). A. M. Archibald |
From: Robert K. <rob...@gm...> - 2006-09-04 05:47:28
|
Charles R Harris wrote: > Hi Robert, > > I am about to get started on some stuff for the random number generators > but thought I would run it by you first. I envisage the following: > > uniform short_doubles -- doubles generated from a single 32 bit random > number (advantage: speed) > uniform double, short_doubles on the interval (0,1) -- don't touch > singularities in functions like log (this is my preferred default) > fast_normal -- ziggurat method using single 32 bit random numbers > (advantage: speed) > fast_exponential -- ziggurat method using single 32 bit random numbers > (advantage: speed) > MWC8222 random number generator (advantage: speed on some machines, > different from mtrand) > > Except for the last, none conflict with current routines and can be > added without a branch. I expect adding MWC8222 might need more > extensive work and I will branch for that. So the questions are of > utility and naming. I see some utility for myself, otherwise I wouldn't > be considering doing the work. OTOH, I already have (C++) routines that > I use for these things, so a larger question might be if anyone else > sees a use for these. I like speed, but it is not always that important > in everyday apps. I would prefer not to expand the API of numpy.random. If it weren't necessary for numpy to provide all of the capabilities that came with Numeric's RandomArray, I wouldn't want numpy.random in there at all. Now, a very productive course of action would be to refactor numpy.random such that the distributions (the first four items on your list fall into this category) and the underlying PRNG (the fifth) are separated from one another such that they can be mixed and matched at runtime. A byproduct of this would expose the C API of both of these in order to be usable by other C extension modules, something that's been asked for about a dozen times now. The five items on your list could be implemented in an extension module distributed in scipy. > I see that Pyrex is used for the interface, so I suppose that is one > more tool to become familiar with ;) Possibly not. Pyrex was getting in the way of exposing a C API the last time I took a stab at it. A possibility that just occurred to me is to make an extension module that *only* exposes the C API and mtrand could be rewritten to use that API. Hmmm. I like that. I can give some guidance about how to proceed and help you navigate the current code, but I'm afraid I don't have much time to actually code. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: A. M. A. <per...@gm...> - 2006-09-04 05:43:13
|
On 03/09/06, Charles R Harris <cha...@gm...> wrote: > Travis has introduced the function diagflat for creating diagonal arrays. It > flattens whatever array is supplied and returns its values as the diagonal > of a 2-D array or matrix. As the function is currently undocumented I > thought now might be a good time to discuss other possible choices for the > name. Here is a quick list that comes to mind > > diagflat (current name) > asdiagonal > todiagonal > asdiag > todiag > ... I was wishing for that just the other day... I think my vote is for "diagflat", to emphasize the initial flattening. In particular, there's another function I could wish for: given an array, pick an axis, treat the array like an array of vectors along that axis, and replace each vector with the diagonal matrix. So, supposing that it was called todiagonal: >>> A = todiagonal(ones(2,3,4),axis=1) >>> shape(A) (2,3,3,4) >>> A[1,:,:,2] array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]) Specifically I find the forcible flattening a bit monstrous. And you can always just do todiagonal(A.flat). No such function seems to exist at the moment, so I'm attaching one; probably not as efficient as it could be with appropriate fancy indexing wizardry. A. M. Archibald |
From: Sebastian H. <ha...@ms...> - 2006-09-04 05:42:26
|
Charles R Harris wrote: > > > On 9/3/06, *Sebastian Haase* <ha...@ms... > <mailto:ha...@ms...>> wrote: > > Hi, > I just learn about the existence of round(). > Is the following exposing a bug? > >>> N.__version__ > '1.0b4' > >>> N.array([65.0]) > [ 65.] > >>> _.round(-1) > [ 60.] > >>> round(65, -1) > 70.0 > >>> N.array([65]) > [65] > >>> _.round(-1) > [60] > >>> N.array([66]) > [66] > >>> _.round(-1) > [60] > >>> N.array([66.2]) > [ 66.2] > >>> _.round(-1) > [ 70.] > > 65 should round *up* to 70. (when decimals is given as -1) > In numpy even 66 rounds down, but 66.2 rounds up ... > > > Numpy rounds to even. > > >>> around(arange(10)*.5) > array([ 0., 0., 1., 2., 2., 2., 3., 4., 4., 4.]) > > i.e., when at those x.5 boundaries, round to the nearest even number. > Always rounding in the same direction biases the numbers when you have > gazillions of them. This is most likely of import when the fpu is > rounding intermediate results to register length, but numpy does it too. > Floats can be weird anyway, .15, for instance, can't be represented > exactly as an ieee float. It does tend to throw folks for a loop when > they don't keep this in mind. > > - Sebastian Haase > > > Chuck Thanks for the reply - but read what the doc says: >>> N.around.__doc__ 'Round 'a' to the given number of decimal places. Rounding behaviour is equivalent to Python. Return 'a' if the array is not floating point. Round both the real and imaginary parts separately if the array is complex. ' it is *not* done this way in Python: >>> round(.5) 1.0 >>> round(1.5) 2.0 >>> round(2.5) 3.0 ( the round obj. method is missing this doc string ) I really think we should stick to what the doc string say - everybody expects x.5 to round up !! look even at this: >>> repr(2.05) '2.0499999999999998' >>> round(2.05, 1) 2.1 (don't ask me how Python does this, but it's "nice" ) Thanks, Sebastian |
From: A. M. A. <per...@gm...> - 2006-09-04 05:34:30
|
On 04/09/06, Charles R Harris <cha...@gm...> wrote: > Except for the last, none conflict with current routines and can be added > without a branch. I expect adding MWC8222 might need more extensive work and > I will branch for that. So the questions are of utility and naming. I see > some utility for myself, otherwise I wouldn't be considering doing the work. > OTOH, I already have (C++) routines that I use for these things, so a larger > question might be if anyone else sees a use for these. I like speed, but it > is not always that important in everyday apps. How painful would it be to allow users to drop in their own sources of raw random numbers? A couple of years ago I was trying to analyze some X-ray timing data and we saw some peaks at the end of a long chain of periodicity-detecting software (Fourier transforms, parameter searching, et cetera). To rule out the possibility that they were coming from the random numbers we were using at one stage, I wanted to supply raw random numbers from /dev/random. Unfortunately, that forced me to write my own program to convert uniform random numbers to exponential. Allowing the function that generates raw random bytes to be overwritten would be extremely handy, even if it were painfully slow. In the same project I also noticed it would be nice to be able to (say) do "exponential(2+sin(arange(10)))" to get an array of exponentials with varying parameters. I realize all this is outside the scope of what you were asking. I would have found another pseudorandom number generator that I could just switch to a benefit; the rest of them are less exciting. When you say "32-bit integers" do you really mean 32-bit, or do you mean "machine width"? 32-bit integers may not be faster on 64-bit platforms... A. M. Archibald |
From: Charles R H. <cha...@gm...> - 2006-09-04 04:55:58
|
Hi Robert, I am about to get started on some stuff for the random number generators but thought I would run it by you first. I envisage the following: uniform short_doubles -- doubles generated from a single 32 bit random number (advantage: speed) uniform double, short_doubles on the interval (0,1) -- don't touch singularities in functions like log (this is my preferred default) fast_normal -- ziggurat method using single 32 bit random numbers (advantage: speed) fast_exponential -- ziggurat method using single 32 bit random numbers (advantage: speed) MWC8222 random number generator (advantage: speed on some machines, different from mtrand) Except for the last, none conflict with current routines and can be added without a branch. I expect adding MWC8222 might need more extensive work and I will branch for that. So the questions are of utility and naming. I see some utility for myself, otherwise I wouldn't be considering doing the work. OTOH, I already have (C++) routines that I use for these things, so a larger question might be if anyone else sees a use for these. I like speed, but it is not always that important in everyday apps. I see that Pyrex is used for the interface, so I suppose that is one more tool to become familiar with ;) Chuck |
From: Charles R H. <cha...@gm...> - 2006-09-04 00:18:38
|
On 9/3/06, Sebastian Haase <ha...@ms...> wrote: > > Hi, > I just learn about the existence of round(). > Is the following exposing a bug? > >>> N.__version__ > '1.0b4' > >>> N.array([65.0]) > [ 65.] > >>> _.round(-1) > [ 60.] > >>> round(65, -1) > 70.0 > >>> N.array([65]) > [65] > >>> _.round(-1) > [60] > >>> N.array([66]) > [66] > >>> _.round(-1) > [60] > >>> N.array([66.2]) > [ 66.2] > >>> _.round(-1) > [ 70.] > > 65 should round *up* to 70. (when decimals is given as -1) > In numpy even 66 rounds down, but 66.2 rounds up ... Numpy rounds to even. >>> around(arange(10)*.5) array([ 0., 0., 1., 2., 2., 2., 3., 4., 4., 4.]) i.e., when at those x.5 boundaries, round to the nearest even number. Always rounding in the same direction biases the numbers when you have gazillions of them. This is most likely of import when the fpu is rounding intermediate results to register length, but numpy does it too. Floats can be weird anyway, .15, for instance, can't be represented exactly as an ieee float. It does tend to throw folks for a loop when they don't keep this in mind. - Sebastian Haase Chuck |
From: Sebastian H. <ha...@ms...> - 2006-09-04 00:03:46
|
Hi, I just learn about the existence of round(). Is the following exposing a bug? >>> N.__version__ '1.0b4' >>> N.array([65.0]) [ 65.] >>> _.round(-1) [ 60.] >>> round(65, -1) 70.0 >>> N.array([65]) [65] >>> _.round(-1) [60] >>> N.array([66]) [66] >>> _.round(-1) [60] >>> N.array([66.2]) [ 66.2] >>> _.round(-1) [ 70.] 65 should round *up* to 70. (when decimals is given as -1) In numpy even 66 rounds down, but 66.2 rounds up ... - Sebastian Haase |