You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: George N. <gn...@go...> - 2006-10-18 22:21:21
|
On 18/10/06, Charles R Harris <cha...@gm...> wrote: > > > On 10/18/06, George Nurser <gn...@go...> wrote: > > > > None of the LaPack stuff seems to use the Fortran stuff, they just > > > > transpose and copy. > > > > You've got me worried here. I have assumed that when you start with a > > c-contiguous array, a, with say, a.shape = (m,n), if you use the > > transpose as an argument to a fortran routine which requires an mxn > > size array, then no copying is required. > > > > This seems to work for me -- the transpose *does* have fortran order. > > Nope. The result is an (n,m) array in fortran order, not an (m,n) array in > fortran order. Presumably that's because it's a view of the original array. >> > Also, in f2py, if I use -DF2PY_REPORT_ON_ARRAY_COPY=1 I receive no > > alert of any copy. > > f2py takes care of the ordering, which is one reason why it is so useful. Yes, when I first used it, I assumed that the fortran routine had to use an nxm array. But f2py is clever enough to make the above work. George. |
From: Charles R H. <cha...@gm...> - 2006-10-18 22:19:46
|
On 10/18/06, Travis Oliphant <oli...@ee...> wrote: > > Charles R Harris wrote: > > > > > Could we make a few changes ;) > > > > For printing the flags I would suggest using C-Contiguous and > > F-Contiguous so folks don't have to read the book. And at the c level > > define alternates, i.e, #define c-contiguous contiguous or whatever. > > That way backward compatibility would be maintained but more > > descriptive names would be available. > > Printing the flags is not intended for the casual user. So, I'd like to > keep consistent with C-level names and the names that are printed. > > CONTIGUOUS is the old name Numeric used. It always meant C-CONTIGUOUS > and so that meaning is preserved. FORTRAN is the new one flag and it > means FORTRAN CONTIGUOUS. > > So, you want something like? > > #define NPY_C_CONTIGUOUS NPY_CONTIGUOUS > #define NPY_F_CONTIGUOUS NPY_FORTRAN > > and to have C_CONTIGUOUS and F_CONTIGUOUS print for the flags description? Yes, I think that would be more informative. I'm not opposed to it, but I don't really see the need. It's just a > semantic question. Given the history of CONTIGUOUS in Numeric I thought > it was clear that CONTIGUOUS always meant C-contiguous. Well, I knew that for numeric, but it was a good deal less obvious in combo with the order keyword. For instance, contiguous could change its meaning to match up with FORTRAN, so that FORTRAN=True and CONTIGUOUS=True meant Fortran contiguous, which was sort of what I was thinking. Explicit never hurts. Chuck |
From: Travis O. <oli...@ee...> - 2006-10-18 22:06:46
|
Charles R Harris wrote: > > Could we make a few changes ;) > > For printing the flags I would suggest using C-Contiguous and > F-Contiguous so folks don't have to read the book. And at the c level > define alternates, i.e, #define c-contiguous contiguous or whatever. > That way backward compatibility would be maintained but more > descriptive names would be available. Printing the flags is not intended for the casual user. So, I'd like to keep consistent with C-level names and the names that are printed. CONTIGUOUS is the old name Numeric used. It always meant C-CONTIGUOUS and so that meaning is preserved. FORTRAN is the new one flag and it means FORTRAN CONTIGUOUS. So, you want something like? #define NPY_C_CONTIGUOUS NPY_CONTIGUOUS #define NPY_F_CONTIGUOUS NPY_FORTRAN and to have C_CONTIGUOUS and F_CONTIGUOUS print for the flags description? I'm not opposed to it, but I don't really see the need. It's just a semantic question. Given the history of CONTIGUOUS in Numeric I thought it was clear that CONTIGUOUS always meant C-contiguous. -Travis |
From: Charles R H. <cha...@gm...> - 2006-10-18 21:48:29
|
On 10/18/06, George Nurser <gn...@go...> wrote: > > > > None of the LaPack stuff seems to use the Fortran stuff, they just > > > transpose and copy. > > You've got me worried here. I have assumed that when you start with a > c-contiguous array, a, with say,a.shape = (m,n), if you use the > transpose as an argument to a fortran routine which requires an mxn > size array, then no copying is required. > > This seems to work for me -- the transpose *does* have fortran order. Nope. The result is an (n,m) array in fortran order, not an (m,n) array in fortran order. In [52]:a = array([[1,2,3],[4,5,6]]) In [53]:a.transpose().flags Out[53]: CONTIGUOUS : False FORTRAN : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [54]:a.transpose().shape Out[54]:(3, 2) Looks like what you want is either fastcopyandtranspose or the order flag. In [56]:fastCopyAndTranspose(a).flags Out[56]: CONTIGUOUS : True FORTRAN : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False which is a (n,m) array in c order, i.e., an (m,n) array in fortran order. Or In [57]:array(a, order='F').flags Out[57]: CONTIGUOUS : False FORTRAN : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False which is a (m,n) array in fortran order. Also, in f2py, if I use -DF2PY_REPORT_ON_ARRAY_COPY=1 I receive no > alert of any copy. f2py takes care of the ordering, which is one reason why it is so useful. Chuck |
From: George N. <gn...@go...> - 2006-10-18 21:05:22
|
> > None of the LaPack stuff seems to use the Fortran stuff, they just > > transpose and copy. You've got me worried here. I have assumed that when you start with a c-contiguous array, a, with say,a.shape = (m,n), if you use the transpose as an argument to a fortran routine which requires an mxn size array, then no copying is required. This seems to work for me -- the transpose *does* have fortran order. Also, in f2py, if I use -DF2PY_REPORT_ON_ARRAY_COPY=1 I receive no alert of any copy. Apologies if these are simply confused ravings. George Nurser. |
From: Charles R H. <cha...@gm...> - 2006-10-18 21:03:38
|
On 10/18/06, Alan G Isaac <ai...@am...> wrote: > > On Wed, 18 Oct 2006, Keith Goodman apparently wrote: <snip> Here's a simpler (?) example: > >>> x=numpy.random.rand(300,1)>0 > >>> x.sum() > 300 > >>> sum(x) > array([44], dtype=int8) > >>> x=numpy.random.rand(300)>0 > >>> sum(x) > 300 > > Alan Isaac Hmmm, I think sum(x) and x.sum() should behave the same. Note that In [12]:sum(x, dtype=int) Out[12]:300 I think sum should stick to the modular arithmetic unless specified otherwise. But in any case sum(x) and x.sum() should do the same thing. Chuck |
From: Charles R H. <cha...@gm...> - 2006-10-18 20:51:14
|
Travis, On 10/18/06, Travis Oliphant <oli...@ie...> wrote: > > Tim Hochberg wrote: > > One thing that may be confusing the issue is that, as I understand it, > > FORTRAN and CONTIGUOUS together represent three states which I'll call > > FORTRAN_ORDER, C_ORDER and DISCONTIGUOUS. > > Yep, that's what they mean. CONTIGUOUS is the name Numeric gave it and > it meant C-order contiguous. We have kept the same meaning. All we've > done is selected out from the class of arrays that Numeric called > DISTCONTIGUOUS, arrays that are FORTRAN-order (and so still > single-segment), but discontiguous in the sense that Numeric had. > > > I periodically wonder if it > > would be valuable to have a way to query the order directly: the result > > would be "C", "F" or None, just like the order keyword that is passed > > in. > You an do it with the flags > > a.flags.contiguous > a.flags.fortran > > Discontiguous is when both of these are false. Note that for a.ndim < > 2, both a.flags.contiguous and a.flags.fortran are true if one of them > is true. > > This is all explained in the first chapters of my book. You have to > understand CONTIGUOUS == C-order contiguous and FORTRAN == Fortran-order > contiguous. Could we make a few changes ;) For printing the flags I would suggest using C-Contiguous and F-Contiguous so folks don't have to read the book. And at the c level define alternates, i.e, #define c-contiguous contiguous or whatever. That way backward compatibility would be maintained but more descriptive names would be available. Chuck |
From: Alan G I. <ai...@am...> - 2006-10-18 20:24:27
|
On Wed, 18 Oct 2006, Keith Goodman apparently wrote:=20 > Here's an example:=20 >>> x =3D zeros((300, 300))=20 >>> x =3D x > 1 # False=20 >>> x =3D 1 - x # ones=20 >>> y =3D x.T * x=20 >>> y[0,0]=20 > 44=20 Here's a simpler (?) example: >>> x=3Dnumpy.random.rand(300,1)>0 >>> x.sum() 300 >>> sum(x) array([44], dtype=3Dint8) >>> x=3Dnumpy.random.rand(300)>0 >>> sum(x) 300 Alan Isaac |
From: Travis O. <oli...@ie...> - 2006-10-18 19:46:41
|
Tim Hochberg wrote: > One thing that may be confusing the issue is that, as I understand it, > FORTRAN and CONTIGUOUS together represent three states which I'll call > FORTRAN_ORDER, C_ORDER and DISCONTIGUOUS. Yep, that's what they mean. CONTIGUOUS is the name Numeric gave it and it meant C-order contiguous. We have kept the same meaning. All we've done is selected out from the class of arrays that Numeric called DISTCONTIGUOUS, arrays that are FORTRAN-order (and so still single-segment), but discontiguous in the sense that Numeric had. > I periodically wonder if it > would be valuable to have a way to query the order directly: the result > would be "C", "F" or None, just like the order keyword that is passed > in. You an do it with the flags a.flags.contiguous a.flags.fortran Discontiguous is when both of these are false. Note that for a.ndim < 2, both a.flags.contiguous and a.flags.fortran are true if one of them is true. This is all explained in the first chapters of my book. You have to understand CONTIGUOUS == C-order contiguous and FORTRAN == Fortran-order contiguous. -Travis |
From: Travis O. <oli...@ie...> - 2006-10-18 19:46:28
|
> > I'm not talking about the keyword in the ravel call, I'm talking about > the flag in a. Ah. Yes, I see. I misunderstood. Of course ravel ignores the FORTRAN flag (actually it doesn't because if a copy is not necessary it doesn't make one). The key is that the Python user doesn't need to care about the array flag unless they are interfacing to compiled code. That's the point of the flag. It's actually redundant because it could be checked every time it's needed. But, right now, it's kept updated so that the check is simple. The same is true with the C-CONTIGUOUS flag (called contiguous). > The question is: do we *need* a fortran flag. No, you don't *need* the flag. But, it saves copying data to check it (look how many times ISFORTRAN is called in the code). Without the flag all of those cases would need to do a strides-check which is done in the UpdateFlags code. > I am argueing not, because the only need is for fortran contiguous > arrays to pass to fortran function, or translation from fortran > contiguous arrays to numpy arrays. What I am saying is that things are > unnecessarily complicated. I disagree. It's actually not that complicated. Even if it was compilcated to implement, the point is that it is now done. There is no sense ripping it out (that would be a huge pain and for what purpose?) The FORTRAN flag gives us a lot more flexibility when it comes to copying data or not. I think part of the complication is that you are misunderstanding some of the terms and the purposes of the keywords. > None of the LaPack stuff seems to use the Fortran stuff, they just > transpose and copy. It doesn't now only because I haven't had time to go through and change it, but it should. Look at scipy's LaPack interface. It (through f2py) uses the FORTRAN stuff extensively (much was borrowed from there in the first place). -Travis |
From: Travis O. <oli...@ie...> - 2006-10-18 19:46:25
|
> > I'm not talking about the keyword in the ravel call, I'm talking about > the flag in a. Ah. Yes, I see. I misunderstood. Of course ravel ignores the FORTRAN flag (actually it doesn't because if a copy is not necessary it doesn't make one). The key is that the Python user doesn't need to care about the array flag unless they are interfacing to compiled code. That's the point of the flag. It's actually redundant because it could be checked every time it's needed. But, right now, it's kept updated so that the check is simple. The same is true with the C-CONTIGUOUS flag (called contiguous). > The question is: do we *need* a fortran flag. No, you don't *need* the flag. But, it saves copying data to check it (look how many times ISFORTRAN is called in the code). Without the flag all of those cases would need to do a strides-check which is done in the UpdateFlags code. > I am argueing not, because the only need is for fortran contiguous > arrays to pass to fortran function, or translation from fortran > contiguous arrays to numpy arrays. What I am saying is that things are > unnecessarily complicated. I disagree. It's actually not that complicated. The FORTRAN flag gives us a lot more flexibility when it comes to copying data or not. I think part of the complication is that you are misunderstanding some of the terms and the purposes of the keywords. > None of the LaPack stuff seems to use the Fortran stuff, they just > transpose and copy. It doesn't now only because I haven't had time to go through and change it, but it should. Look at scipy's LaPack interface. It (through f2py) uses the FORTRAN stuff extensively (much was borrowed from there in the first place). -Travis |
From: Keith G. <kwg...@gm...> - 2006-10-18 19:31:47
|
On 10/18/06, Christopher Hanley <ch...@st...> wrote: > I can't decide if the following is a bug or feature. Numpy scalars are > allowed to overflow silently while numarray upcasts to a larger python > type. I guess my biggest problem is that the overflow occurs silently. I had that problem yesterday, which was difficult to diagnose for a new user like me. Here's an example: >> x = zeros((300, 300)) >> x = x > 1 # False >> x = 1 - x # ones >> y = x.T * x >> y[0,0] 44 |
From: Christopher H. <ch...@st...> - 2006-10-18 19:10:19
|
Greetings, I can't decide if the following is a bug or feature. Numpy scalars are allowed to overflow silently while numarray upcasts to a larger python type. I guess my biggest problem is that the overflow occurs silently. In any case, is this known and expected behavior? Thanks, Chris NUMARRAY example: In [1]: import numarray as n In [2]: a = n.array([2200,14000],type=n.UInt32) In [3]: a0= a[0] In [4]: a1 = a[1] In [5]: adiff = a0-a1 In [6]: print a0,a1,adiff 2200 14000 -11800 In [7]: print type(a0),type(a1),type(adiff) <type 'long'> <type 'long'> <type 'long'> NUMPY example: In [1]: import numpy as n In [2]: a = n.array([2200,14000],dtype=n.uint32) In [3]: a0= a[0] In [4]: a1 = a[1] In [5]: adiff = a0-a1 In [6]: print a0,a1,adiff 2200 14000 4294955496 In [7]: print type(a0),type(a1),type(adiff) <type 'numpy.uint32'> <type 'numpy.uint32'> <type 'numpy.uint32'> |
From: Charles R H. <cha...@gm...> - 2006-10-18 19:10:18
|
On 10/18/06, Tim Hochberg <tim...@ie...> wrote: > > Charles R Harris wrote: > > > > > > On 10/18/06, *Tim Hochberg* <tim...@ie... > > <mailto:tim...@ie...>> wrote: > > > > Charles R Harris wrote: > > > > [SNIP] > > > > > > I'm not talking about the keyword in the ravel call, I'm talking > > about > > > the flag in a. The question is: do we *need* a fortran flag. I am > > > argueing not, because the only need is for fortran contiguous > > arrays > > > to pass to fortran function, or translation from fortran > contiguous > > > arrays to numpy arrays. What I am saying is that things are > > > unnecessarily complicated. None of the LaPack stuff seems to use > > the > > > Fortran stuff, they just transpose and copy. I don't even think > > I want > > > to change that, because it is *clear* what is going on. > > Interfacing to > > > fortran is all about memory layout, nothing more or less. > > > > > > > Chuck, > > > > There are two things here. One is the order keyword and one is the > > FORTRAN flag. The latter is mainly an optimization for use at the > > C-level so that one doesn't have to check whether a given array is > in > > contiguous FORTRAN order by examining the strides, in the same way > > that > > the CONTIGUOUS flag allows you to skip examining the strides when > you > > need a contiguous C-order matrix. > > > > > > That sounds like the two flags should be named f-contiguous and > > c-contiguous. Then they would be orthogonal and one could have all > > four combinations. Is that the case now? Perhaps I am misunderstanding > > the meaning of the flags. > That is the case now. The flag names simply mirror their values in C. > Why they have those names in something of a historical accident I > believe. Take a look at this: OK, that is good. I no longer have any objection to the flags, I just wish the names were more descriptive of what they mean. In fact, it looks like the following sort of construction will be useful in the linalg module. In [17]:a = array([[1,2],[3,4]], dtype=int) In [18]:b = array(a, dtype=double, order='f') In [19]:b.flags Out[19]: CONTIGUOUS : False FORTRAN : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False I've been a pain in the a** because I really want to know what is going on down in the boiler room. Chuck |
From: Tim H. <tim...@ie...> - 2006-10-18 18:29:26
|
My $0.02: If histogram is going to get a makeover, particularly one that makes it more complex than at present, it should probably be moved to SciPy. Failing that, it should be moved to a submodule of numpy with similar statistical tools. Preferably with consistent interfaces for all of the functions. |
From: Tim H. <tim...@ie...> - 2006-10-18 18:27:47
|
Charles R Harris wrote: > > > On 10/18/06, *Tim Hochberg* <tim...@ie... > <mailto:tim...@ie...>> wrote: > > Charles R Harris wrote: > > [SNIP] > > > > I'm not talking about the keyword in the ravel call, I'm talking > about > > the flag in a. The question is: do we *need* a fortran flag. I am > > argueing not, because the only need is for fortran contiguous > arrays > > to pass to fortran function, or translation from fortran contiguous > > arrays to numpy arrays. What I am saying is that things are > > unnecessarily complicated. None of the LaPack stuff seems to use > the > > Fortran stuff, they just transpose and copy. I don't even think > I want > > to change that, because it is *clear* what is going on. > Interfacing to > > fortran is all about memory layout, nothing more or less. > > > > Chuck, > > There are two things here. One is the order keyword and one is the > FORTRAN flag. The latter is mainly an optimization for use at the > C-level so that one doesn't have to check whether a given array is in > contiguous FORTRAN order by examining the strides, in the same way > that > the CONTIGUOUS flag allows you to skip examining the strides when you > need a contiguous C-order matrix. > > > That sounds like the two flags should be named f-contiguous and > c-contiguous. Then they would be orthogonal and one could have all > four combinations. Is that the case now? Perhaps I am misunderstanding > the meaning of the flags. That is the case now. The flag names simply mirror their values in C. Why they have those names in something of a historical accident I believe. Take a look at this: >>> array([1,2,3,4]).flags CONTIGUOUS : True FORTRAN : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> array([1,2,3,4])[::2].flags CONTIGUOUS : False FORTRAN : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> array([[1,2],[3,4]]).flags CONTIGUOUS : True FORTRAN : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> array([[1,2],[3,4]], order='F').flags CONTIGUOUS : False FORTRAN : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False See, all four combinations. I guess my previous post was wrong -- there really are four combinations not three like I said, but they are C-order, Fortran-order, Both and neither. I forgot what the flags signified and had to play with them for a bit to remember. > I believe it is the former that you > are objecting to, but it would help if you could specify whether > you are > talking about the order keyword or whether you are talking about the > FORTRAN flag. > > > Both. I was argueing against the FORTRAN flag, and of limiting the > order keyword to those cases where f or c contiguous arrays were the > output or input. > > I'll also note that the order keyword could probably have been used to > fix a performance problem someone was having a few weeks ago. We > ended > up transposing the data, but the individual felt that obscured the > intent of the algorithm. I believe the same effect could probably have > been been achieved without re jiggering the algorithm by using the > order > parameter. > > > Some more details would be helpful. It would be good to know what > problem the order keyword should solve. > Well, in general, memory layout can be important for performance not just for interfacing with Fortran. You can do this with suitable applications of transpose, but using the order flag is probably clearer. Particularly, if you are trying to match a textbook algorithm, its nice to have the axes in the same places. I'm just moving over from numarray which didn't have the equivalent of the order flag as far as I know, so I don't have experience with this at this point though. Here is one of the posts in questions: > David Cournapeau wrote: > Hi, > > I was wondering if there was any way to speed up the following code: > > y = N.zeros((n, K)) > for i in range(K): > y[:, i] = gauss_den(data, mu[i, :], va[i, :]) > > Where K is of order 1e1, n of order 1e5. Normally, gauss_den is a > quite expensive function, but the profiler tells me that the indexing > y[:,i] takes almost as much time as the gauss_den computation (which > computes n exp !). To see if the profiler is "right", i replaces with > the (non valid) following function: > > y = N.zeros((n, K)) > for i in range(K): > yt = gauss_den(data, mu[i, :], va[i, :]) > return y > > Where more than 99% of the code is spent inside gauss_den. > > I guess the problem is coming from the fact that y being C order, y[:, > i] needs accessing data in a non 'linear' way. Is there a way to speed > this up ? I did something like this: > > y = N.zeros((K, n)) > for i in range(K): > y[i] = gauss_den(data, mu[i, :], va[i, :]) > return y.T > > which works, but I don't like it very much. I believe that the same efficiency as the last could have been achieved using something like: y = N.zeros((n,K), order='F') for i in range(K): y[:,i] = gauss_den(data, mu[i, :], va[i, :]) return y This probably would have made the original poster happier. -tim |
From: Charles R H. <cha...@gm...> - 2006-10-18 18:08:10
|
On 10/18/06, Tim Hochberg <tim...@ie...> wrote: > > Charles R Harris wrote: > > [SNIP] > > > > I'm not talking about the keyword in the ravel call, I'm talking about > > the flag in a. The question is: do we *need* a fortran flag. I am > > argueing not, because the only need is for fortran contiguous arrays > > to pass to fortran function, or translation from fortran contiguous > > arrays to numpy arrays. What I am saying is that things are > > unnecessarily complicated. None of the LaPack stuff seems to use the > > Fortran stuff, they just transpose and copy. I don't even think I want > > to change that, because it is *clear* what is going on. Interfacing to > > fortran is all about memory layout, nothing more or less. > > > > Chuck, > > There are two things here. One is the order keyword and one is the > FORTRAN flag. The latter is mainly an optimization for use at the > C-level so that one doesn't have to check whether a given array is in > contiguous FORTRAN order by examining the strides, in the same way that > the CONTIGUOUS flag allows you to skip examining the strides when you > need a contiguous C-order matrix. That sounds like the two flags should be named f-contiguous and c-contiguous. Then they would be orthogonal and one could have all four combinations. Is that the case now? Perhaps I am misunderstanding the meaning of the flags. I believe it is the former that you > are objecting to, but it would help if you could specify whether you are > talking about the order keyword or whether you are talking about the > FORTRAN flag. Both. I was argueing against the FORTRAN flag, and of limiting the order keyword to those cases where f or c contiguous arrays were the output or input. I'll also note that the order keyword could probably have been used to > fix a performance problem someone was having a few weeks ago. We ended > up transposing the data, but the individual felt that obscured the > intent of the algorithm. I believe the same effect could probably have > been been achieved without re jiggering the algorithm by using the order > parameter. Some more details would be helpful. It would be good to know what problem the order keyword should solve. Chuck |
From: Erin S. <eri...@gm...> - 2006-10-18 18:00:34
|
On 10/17/06, David Huard <dav...@gm...> wrote: > Hi all, > > I'd like to poll the list to see what people want from numpy.histogram(), > since I'm currently writing a contender. > > My main complaints with the current version are: > 1. upper outliers are stored in the last bin, while lower outliers are not > counted at all, > 2. cannot use weights. > > The new histogram function is well under way (it address these issues and > adds an axis keyword), > but I want to know what is the preferred behavior regarding the function > output, and your > willingness to introduce a new behavior that will break some code. > > Given a number of bins N and range (min, max), histogram constructs linearly > spaced bin edges > b0 (out-of-range) | b1 | b2 | b3 | .... | bN | bN+1 out-of-range > and may return: > > A. H = array([N_b0, N_b1, ..., N_bN, N_bN+1]) > The out-of-range values are the first and last values of the array. The > returned array is hence N+2 > > B. H = array([N_b0 + N_b1, N_b2, ..., N_bN + N_bN+1]) > The lower and upper out-of-range values are added to the first and last bin > respectively. > > C. H = array([N_b1, ..., N_bN + N_bN+1]) > Current behavior: the upper out-of-range values are added to the last bin. > > D. H = array([N_b1, N_b2, ..., N_bN]), > Lower and upper out-of-range values are given after the histogram array. > > Ideally, the new function would not break the common usage: H = > histogram(x)[0], so this exclude A. B and C are not acceptable in my > opinion, so only D remains, with the downsize that the outliers are not > returned. A solution might be to add a keyword full_output=False, which when > set to True, returns the out-of-range values in a dictionnary. > > Also, the current function returns -> H, ledges > where ledges is the array of left bin edges (N). > I propose returning the complete array of edges (N+1), including the > rightmost edge. This is a little bit impractical for plotting, as the edges > array does not have the same length as the histogram array, but allows the > use of user-defined non-uniform bins. > > Opinions, suggestions ? I dislike the current behavior. I don't want the histogram to count anything outside the range I specify. It would also be nice to allow specification of a binsize which would be used if number of bins wasn't sent. Personally, since I don't have any code yet that uses histogram, I feel like edges could be returned in a keyword. Perhaps in a dictionary with other useful items, such as bin middles, mean of the data in bins and other statistics, or whatever, which would only be calculated if the keyword dict was sent. Hopefully Google and sourceforge are playing nice and you will see this within a day of sending. Erin |
From: Tim H. <tim...@ie...> - 2006-10-18 17:58:33
|
Charles R Harris wrote: [SNIP] > > I'm not talking about the keyword in the ravel call, I'm talking about > the flag in a. The question is: do we *need* a fortran flag. I am > argueing not, because the only need is for fortran contiguous arrays > to pass to fortran function, or translation from fortran contiguous > arrays to numpy arrays. What I am saying is that things are > unnecessarily complicated. None of the LaPack stuff seems to use the > Fortran stuff, they just transpose and copy. I don't even think I want > to change that, because it is *clear* what is going on. Interfacing to > fortran is all about memory layout, nothing more or less. > Chuck, There are two things here. One is the order keyword and one is the FORTRAN flag. The latter is mainly an optimization for use at the C-level so that one doesn't have to check whether a given array is in contiguous FORTRAN order by examining the strides, in the same way that the CONTIGUOUS flag allows you to skip examining the strides when you need a contiguous C-order matrix. I believe it is the former that you are objecting to, but it would help if you could specify whether you are talking about the order keyword or whether you are talking about the FORTRAN flag. I'll also note that the order keyword could probably have been used to fix a performance problem someone was having a few weeks ago. We ended up transposing the data, but the individual felt that obscured the intent of the algorithm. I believe the same effect could probably have been been achieved without re jiggering the algorithm by using the order parameter. -tim |
From: Tim H. <tim...@ie...> - 2006-10-18 17:51:17
|
One thing that may be confusing the issue is that, as I understand it, FORTRAN and CONTIGUOUS together represent three states which I'll call FORTRAN_ORDER, C_ORDER and DISCONTIGUOUS. I periodically wonder if it would be valuable to have a way to query the order directly: the result would be "C", "F" or None, just like the order keyword that is passed in. This might well eliminate sine confusion. However, 99% of the time the order just doesn't matter, so it's probably pointless. -tim |
From: Charles R H. <cha...@gm...> - 2006-10-18 17:48:21
|
On 10/18/06, Travis Oliphant <oli...@ie...> wrote: > > > > > > Currently, the key operation is reshape, which only needs to return a > > view in fortran order and doesn't even need to mark the resulting > > array as fortran order because, well, because it works just fine in > > numpy as is, it just isn't contiguous. If the other functions took > > shape and order, reshape wouldn't even need the order keyword. > The flag is the there as a quick check for interfacing. The order > keyword grew because it was useful to avoid the arbitrariness of > C-contiguous order for those who prefer to think of it differently. > Remember the .T attribute for .transpose() was a recent addition and > sticking .transpose() everywhere is a lot more ugly. But, yes, many > uses of the order keyword could be replaced by preceding with > .transpose() --- this is not without cost, however. > > > > > I don't see why the array constructor needs the order keyword, it > > doesn't *do* anything. For instance > > > > a = array([[1,2,3],[4,5,6]], order='F') > > > > doesn't produce a fortran contiguous array, it produces the same array > > as the 'C' form, just sets the fortran flag and marks contiguous as > > False. What is the use of that? It is just a generic non-contiguous > > numpy array. > > What? You're not understanding something. The order flag definitely > does something here. First of all it seems like you are not > understanding the meaning of the CONTIGUOUS flag. CONTIGUOUS means > "C-order contiguous" while FORTRAN means "FORTRAN-order contiguous". > That's why I use the word single-segment to talk about FORTRAN-order or > C-contiguous order. For Numeric, CONTIGUOUS always meant C-order > contiguous and we are continuing that tradition. All we've done is > notice that there is such a think as FORTRAN-order contiguous and copies > do not need to be made in all circumstances when you have FORTRAN-order. > > Look at the difference between: > > a = array([[1,2,3],[4,5,6]],order='F').data[:] > > b = array([[1,2,3],[4,5,6]]).data[:] > > Notice the layout is definitely different between a and b. > > > And > > > > In [131]: ascontiguousarray(array([[1,2,3],[4,5,6]], dtype=int8, > > order='F')).flags > > Out[131]: > > CONTIGUOUS : True > > FORTRAN : False > > OWNDATA : True > > WRITEABLE : True > > ALIGNED : True > > UPDATEIFCOPY : False > > > > Doesn't produce a fortran contiguous array, so what use was the flag? > And > > Because you requested a C-contiguous array --- that's what contiguous > means in NumPy (exactly what it meant in Numeric). > > > > > In [141]: array([1,2,3,4,5,6], dtype=int8).reshape((2,3), > > order='F').astype(int16).flags > > Out[141]: > > CONTIGUOUS : True > > FORTRAN : False > > OWNDATA : True > > WRITEABLE : True > > ALIGNED : True > > UPDATEIFCOPY : False > > > > reorders stuff in memory, so is a bug looking to happen in a fortran > > interface. > > Yes, like I said before, all kinds of operations alter the "layout" of > data. You can't assume all operations will preserve FORTRAN ordering. > FORTRAN-order has meaning beyond how the data is actually set out in > memory. Sometimes it indicates how you think it is layed out when you > are doing re-shaping operations. > > > > > mmapped files are the only thing I can think of where one might want > > vary an operation depending on Fortran ordering because seeking out of > > order is very expensive. But that means adapting algorithms depending > > on order type, better I think to just stick to using the small strided > > dimensions when appropriate. > > > > It would be helpful in debugging all this order stuff if it was clear > > what was supposed to happen in every case. Ravel, for instance, > > ignores the FORTRAN flag, again begging the question as to why we > > *have* the flag. > No it doesn't. Please show your evidence. Look: > > a = array([[1,2,3],[4,5,6]]) > > print a.ravel() > [1 2 3 4 5 6] > > print a.ravel('F') > [1 4 2 5 3 6] I'm not talking about the keyword in the ravel call, I'm talking about the flag in a. The question is: do we *need* a fortran flag. I am argueing not, because the only need is for fortran contiguous arrays to pass to fortran function, or translation from fortran contiguous arrays to numpy arrays. What I am saying is that things are unnecessarily complicated. None of the LaPack stuff seems to use the Fortran stuff, they just transpose and copy. I don't even think I want to change that, because it is *clear* what is going on. Interfacing to fortran is all about memory layout, nothing more or less. Chuck |
From: Travis O. <oli...@ie...> - 2006-10-18 17:35:05
|
> > Currently, the key operation is reshape, which only needs to return a > view in fortran order and doesn't even need to mark the resulting > array as fortran order because, well, because it works just fine in > numpy as is, it just isn't contiguous. If the other functions took > shape and order, reshape wouldn't even need the order keyword. The flag is the there as a quick check for interfacing. The order keyword grew because it was useful to avoid the arbitrariness of C-contiguous order for those who prefer to think of it differently. Remember the .T attribute for .transpose() was a recent addition and sticking .transpose() everywhere is a lot more ugly. But, yes, many uses of the order keyword could be replaced by preceding with .transpose() --- this is not without cost, however. > > I don't see why the array constructor needs the order keyword, it > doesn't *do* anything. For instance > > a = array([[1,2,3],[4,5,6]], order='F') > > doesn't produce a fortran contiguous array, it produces the same array > as the 'C' form, just sets the fortran flag and marks contiguous as > False. What is the use of that? It is just a generic non-contiguous > numpy array. What? You're not understanding something. The order flag definitely does something here. First of all it seems like you are not understanding the meaning of the CONTIGUOUS flag. CONTIGUOUS means "C-order contiguous" while FORTRAN means "FORTRAN-order contiguous". That's why I use the word single-segment to talk about FORTRAN-order or C-contiguous order. For Numeric, CONTIGUOUS always meant C-order contiguous and we are continuing that tradition. All we've done is notice that there is such a think as FORTRAN-order contiguous and copies do not need to be made in all circumstances when you have FORTRAN-order. Look at the difference between: a = array([[1,2,3],[4,5,6]],order='F').data[:] b = array([[1,2,3],[4,5,6]]).data[:] Notice the layout is definitely different between a and b. > And > > In [131]: ascontiguousarray(array([[1,2,3],[4,5,6]], dtype=int8, > order='F')).flags > Out[131]: > CONTIGUOUS : True > FORTRAN : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > Doesn't produce a fortran contiguous array, so what use was the flag? And Because you requested a C-contiguous array --- that's what contiguous means in NumPy (exactly what it meant in Numeric). > > In [141]: array([1,2,3,4,5,6], dtype=int8).reshape((2,3), > order='F').astype(int16).flags > Out[141]: > CONTIGUOUS : True > FORTRAN : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > reorders stuff in memory, so is a bug looking to happen in a fortran > interface. Yes, like I said before, all kinds of operations alter the "layout" of data. You can't assume all operations will preserve FORTRAN ordering. FORTRAN-order has meaning beyond how the data is actually set out in memory. Sometimes it indicates how you think it is layed out when you are doing re-shaping operations. > > mmapped files are the only thing I can think of where one might want > vary an operation depending on Fortran ordering because seeking out of > order is very expensive. But that means adapting algorithms depending > on order type, better I think to just stick to using the small strided > dimensions when appropriate. > > It would be helpful in debugging all this order stuff if it was clear > what was supposed to happen in every case. Ravel, for instance, > ignores the FORTRAN flag, again begging the question as to why we > *have* the flag. No it doesn't. Please show your evidence. Look: a = array([[1,2,3],[4,5,6]]) print a.ravel() [1 2 3 4 5 6] print a.ravel('F') [1 4 2 5 3 6] If it's not working in some cases, please report that as a bug. -Travis |
From: Robert K. <rob...@gm...> - 2006-10-18 17:20:16
|
Ray Schumacher wrote: > FYI: > at http://numpy.org/, the link "Forums" ( > http://sourceforge.net/forum/?group_id=1369) gives > > *No forums found for Numerical Python* > > ! Yes, I closed them since people would go there for help, but the people that *could* help don't read the forums. If I knew how to make the Forums link go away on the project page, I would do so. If you have questions about numpy (or even Numeric and numarray), this is the place. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Travis O. <oli...@ie...> - 2006-10-18 17:18:11
|
Charles R Harris wrote: > > > On 10/17/06, *Charles R Harris* <cha...@gm... > <mailto:cha...@gm...>> wrote: > > > > On 10/17/06, *A. M. Archibald* < per...@gm... > <mailto:per...@gm...>> wrote: > > On 17/10/06, Charles R Harris <cha...@gm... > <mailto:cha...@gm...>> wrote: > > > > > > On 10/17/06, Travis Oliphant < oli...@ie... > <mailto:oli...@ie...>> wrote: > > > <snip> > > Which doesn't seem to be the case here. I am beginning to wonder > if we really need fortran order, seems that a few well chosen > interface routines would fill the need and avoid much confusion. > > > For instance, it would be nice if flatten took an order keyword: > > In [107]: array([[1,2,3],[4,5,6]], dtype=int8, order='F').flatten() > Out[107]: array([1, 2, 3, 4, 5, 6], dtype=int8) It does take an argument (just not a keyword argument). The general rule I followed (probably inappropriately) was that single-argument methods didn't need keywords. so a.flatten('F') gives you a Fortran-order flattening. -Travis |
From: Travis O. <oli...@ie...> - 2006-10-18 17:15:21
|
David Cournapeau wrote: > Sven Schreiber wrote: > >> Yes it's intended; as far as I understand the python/numpy syntax, <+> >> is an operator, and that triggers assignment by copy (even if you do >> something trivial as bar = +foo, you get a copy, if I'm not mistaken), >> >> > So basically, whenever you have > > foo = expr > > with expr is a numpy expression containing foo, you trigger a copy ? > I think you are better off understanding that "=" is a name binding operation while "+=" calls a special method that allows in-place adding. Thus, bar += foo calls bar.__iadd__(bar, foo) which gives the opportunity to add foo to bar in-place while bar = bar + foo adds bar to foo (which results in a new array that then gets re-bound to the name bar). The '=' sign is not an operator. Perhaps this will help you see the difference. It's good to know what kinds of things trip people up. We can target tutorials and FAQs to those things. -Travis |