From: Fernando P. <Fer...@co...> - 2006-09-03 19:14:31
|
Hi all, I'm wondering if the following difference in behavior of object arrays should be considered a bug. Let a and b be: In [21]: a = [0,1] In [22]: b = [ None, None] If we concatenate a with an empty list, it works: In [23]: numpy.concatenate(([],a)) Out[23]: array([0, 1]) But not so for b: In [24]: numpy.concatenate(([],b)) --------------------------------------------------------------------------- exceptions.ValueError Traceback (most recent call last) /home/fperez/<ipython console> ValueError: 0-d arrays can't be concatenated This behavior changed recently (it used to work with r2788), and I realize it's probably part of all the reworkings of the object arrays which have been discussed on the list, and all of whose details I have to admit I haven't followed. But this behavior strikes me as a bit inconsistent, since concatenation with a non-empty object array works fine: In [26]: numpy.concatenate(([None],b)) Out[26]: array([None, None, None], dtype=object) This is biting us in some code which keeps object arrays, because when operations of the kind N.concatenate((some_list_of_objects[:nn],other_object_array)) are taken and nn happens to be 0, the code just explodes. In our case, the variable nn is a runtime computed quantity that comes from a numerical algorithm, for which 0 is a perfectly reasonable value. Are we just misusing things and is there a reasonable alternative, or should this be considered a numpy bug? The r2788 behavior was certainly a lot less surprising as far as our code was concerned. I realize that one alternative is to wrap everything into arrays: N.concatenate((N.asarray(some_list_of_objects[:nn]),other_object_array)) Is this the only solution moving forward, or could the previous behavior be restored without breaking other areas of the new code/design? Thanks for any input, f |
From: Charles R H. <cha...@gm...> - 2006-09-03 20:43:44
|
On 9/3/06, Fernando Perez <Fer...@co...> wrote: > > Hi all, > > I'm wondering if the following difference in behavior of object arrays > should > be considered a bug. Let a and b be: > > In [21]: a = [0,1] > > In [22]: b = [ None, None] > > If we concatenate a with an empty list, it works: > > In [23]: numpy.concatenate(([],a)) > Out[23]: array([0, 1]) > > But not so for b: > > In [24]: numpy.concatenate(([],b)) > > --------------------------------------------------------------------------- > exceptions.ValueError Traceback (most > recent > call last) > > /home/fperez/<ipython console> > > ValueError: 0-d arrays can't be concatenated I think it's propably a bug: >>> concatenate((array([]),b)) array([None, None], dtype=object) Chuck |
From: Robert K. <rob...@gm...> - 2006-09-03 21:54:17
|
Charles R Harris wrote: > On 9/3/06, *Fernando Perez* <Fer...@co... > <mailto:Fer...@co...>> wrote: > > Hi all, > > I'm wondering if the following difference in behavior of object > arrays should > be considered a bug. Let a and b be: > > In [21]: a = [0,1] > > In [22]: b = [ None, None] > > If we concatenate a with an empty list, it works: > > In [23]: numpy.concatenate(([],a)) > Out[23]: array([0, 1]) > > But not so for b: > > In [24]: numpy.concatenate(([],b)) > --------------------------------------------------------------------------- > exceptions.ValueError Traceback > (most recent > call last) > > /home/fperez/<ipython console> > > ValueError: 0-d arrays can't be concatenated > > > I think it's propably a bug: > > >>> concatenate((array([]),b)) > array([None, None], dtype=object) Well, if you can fix it without breaking anything else, then it's a bug. However, I would suggest that a rule of thumb for using object arrays is to always be explicit. Never rely on automatic conversion from Python containers to object arrays. Since Python containers are also objects, it is usually ambiguous what the user meant. I kind of liked numarray's choice to move the object array into a separate constructor. I think that gave them some flexibility to choose different syntax and semantics from the generic array() constructor. Since constructing object arrays is so different from constructing numeric arrays, I think that difference is warranted. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Fernando P. <fpe...@gm...> - 2006-09-04 19:18:51
|
On 9/3/06, Robert Kern <rob...@gm...> wrote: > > I think it's propably a bug: > > > > >>> concatenate((array([]),b)) > > array([None, None], dtype=object) > > Well, if you can fix it without breaking anything else, then it's a bug. > > However, I would suggest that a rule of thumb for using object arrays is to > always be explicit. Never rely on automatic conversion from Python containers to > object arrays. Since Python containers are also objects, it is usually ambiguous > what the user meant. > > I kind of liked numarray's choice to move the object array into a separate > constructor. I think that gave them some flexibility to choose different syntax > and semantics from the generic array() constructor. Since constructing object > arrays is so different from constructing numeric arrays, I think that difference > is warranted. This is something that should probably be sorted out before 1.0 is out. IMHO, the current behavior is a bit too full of subtle pitfalls to be a good long-term solution, and I think that predictability trumps convenience for a good API. I know that N.array() is already very complex; perhaps the idea of moving all object array construction into a separate function would be a long-term win. I think that object arrays are actually very important for numpy: they provide the bridge between pure numerical, Fortran-like computing and the richer world of Python datatypes and complex objects. But it's important to acknowledge this bridge character: they connect into a world where the basic assumptions of homogeneity of numpy arrays don't apply anymore. I'd be +1 on forcing this acknowledgement by having a separate N.oarray constructor, accessible via the dtype flag to N.array as well, but /without/ N.array trying to invoke it automatically by guessing the contents of what it was fed. The downside of this approach is that much of the code that 'magically' just works with N.array(foo) today, would now break. I'm becoming almost of the opinion that the code is broken already, it just hasn't failed yet :) Over time, I've become more and more paranoid of what constructors (and factory-type functions that build objects) do with their inputs, how strongly they validate them, and how explicit they require their users to be about their intent. While I'm a big fan of Python's duck-typing, I've also learned (the hard way) that in larger codebases, the place to be very strict about input validation is object constructors. It's easy to let garbage seep into an object by not validating a constructor input, and to later (often MUCH later) have your code bizarrely explode with an unrecognizable exception, because that little piece of garbage was fed to some third-party code which was expecting something else. If I've understood history correctly, some of the motivations behind Enthought's Traits are of a similar nature. For now I've dealt with our private problem that spurred my original posting. But I think this issue is worth clarifying for numpy, before 1.0 paints us into a backwards-compatibility corner with a fairly fundamental datatype and constructor. Regards, f |
From: Travis O. <oli...@ie...> - 2006-09-04 22:33:53
|
Fernando Perez wrote: > Hi all, > > I'm wondering if the following difference in behavior of object arrays should > be considered a bug. Let a and b be: > > In [21]: a = [0,1] > > In [22]: b = [ None, None] > > If we concatenate a with an empty list, it works: > > In [23]: numpy.concatenate(([],a)) > Out[23]: array([0, 1]) > > But not so for b: > > In [24]: numpy.concatenate(([],b)) > --------------------------------------------------------------------------- > exceptions.ValueError Traceback (most recent > call last) > > /home/fperez/<ipython console> > > ValueError: 0-d arrays can't be concatenated > This is a result of PyArray_FromAny changing when object arrays are explicitly requested (which they are in this case --- although behind the scenes). I decided to revert to the previous behavior and only use the Object_FromNestedLists code when an error occurs and the user explicitly requested an object array. The downside is that you can not place empty lists (or tuples) as objects in an object-array construct. as you could before. Given the trouble people had with the "feature," it seems wise to use it only when previous code would have raised an error. -Travis |
From: Matthew B. <mat...@gm...> - 2006-09-05 13:35:02
|
Hi, > This is a result of PyArray_FromAny changing when object arrays are > explicitly requested (which they are in this case --- although behind > the scenes). Hmm - I think I am hitting a related bug/feature/surprising change in behavior, which is showing up rather obscurely in a failure of the scipy.io matlab loading tests: http://projects.scipy.org/scipy/scipy/ticket/258 Here's the change I wasn't expecting, present with current SVN: a = arange(2) b = arange(1) c = array([a, b], dtype=object) c -> array([[0, 1], [0, 0]], dtype=object) On a previous version of numpy (1.02b.dev2975) I get the answer I was expecting: array([[0], [0 1]], dtype=object) Best, Matthew |
From: Travis O. <oli...@ie...> - 2006-09-05 16:44:26
|
Matthew Brett wrote: > Hi, > > >> This is a result of PyArray_FromAny changing when object arrays are >> explicitly requested (which they are in this case --- although behind >> the scenes). >> > > Hmm - I think I am hitting a related bug/feature/surprising change in > behavior, which is showing up rather obscurely in a failure of the > scipy.io matlab loading tests: > > http://projects.scipy.org/scipy/scipy/ticket/258 > > Here's the change I wasn't expecting, present with current SVN: > > a = arange(2) > b = arange(1) > c = array([a, b], dtype=object) > c > -> > array([[0, 1], > [0, 0]], dtype=object) > > On a previous version of numpy (1.02b.dev2975) I get the answer I was expecting: > > array([[0], [0 1]], dtype=object) > Grrr.. Object arrays are very hard to get right. I have no idea why this is happening, but I'll look into it. I think it's the bug that led me to put in the special-case object-array handling in the first place. Now, that special-case object-array handling is only done on an error condition, I need to fix this right and raise an inconsistent shape error. It will probably help with the TypeError messages that are currently raised in this situation with other types as well. -Travis |
From: Travis O. <oli...@ie...> - 2006-09-07 06:54:46
|
Charles R Harris wrote: > On 9/6/06, *Charles R Harris* <cha...@gm... > <mailto:cha...@gm...>> wrote: > > > > On 9/6/06, *Travis Oliphant* < oli...@ie... > <mailto:oli...@ie...>> wrote: > > Charles R Harris wrote: > > > > Where is array at this point? > Basically it supports the old Numeric behavior wherein object > array's > are treated as before *except* for when an error would have > occurred > previously when the "new behavior" kicks in. Anything that > violates > that is a bug needing to be fixed. > > This leaves the new object-array constructor used less > often. It could > be exported explicitly into an oarray constructor, but I'm not > sure > about the advantages of that approach. There are benefits to > having > object arrays constructed in the same way as other arrays. It > turns out > many people actually like that feature of Numeric, which is > the reason I > didn't go the route of numarray which pulled object arrays out. > > At this point, however, object arrays can even be part of > records and so > need to be an integral part of the data-type description. > Pulling that > out is not going to happen. A more intelligent object-array > constructor, however, may be a useful tool. > > > OK. I do have a couple of questions. Let me insert the docs for > array and asarray : > > """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0) > > Return an array from object with the specified date-type. > > Inputs: > object - an array, any object exposing the array interface, any > object whose __array__ method returns an array, or any > (nested) sequence. > dtype - The desired data-type for the array. If not given, > then > the type will be determined as the minimum type > required > to hold the objects in the sequence. This > argument can only > be used to 'upcast' the array. For downcasting, > use the > .astype(t) method. > copy - If true, then force a copy. Otherwise a copy will > only occur > if __array__ returns a copy, obj is a nested > sequence, or > a copy is needed to satisfy any of the other > requirements > order - Specify the order of the array. If order is 'C', > then the > array will be in C-contiguous order (last-index > varies the > fastest). If order is 'FORTRAN', then the > returned array > will be in Fortran-contiguous order (first-index > varies the > fastest). If order is None, then the returned > array may > be in either C-, or Fortran-contiguous order or even > discontiguous. > subok - If True, then sub-classes will be passed-through, > otherwise > the returned array will be forced to be a > base-class array > ndmin - Specifies the minimum number of dimensions that the > resulting > array should have. 1's will be pre-pended to the > shape as > needed to meet this requirement. > > """) > > asarray(a, dtype=None, order=None) > Returns a as an array. > > Unlike array(), no copy is performed if a is already an array. > Subclasses > are converted to base class ndarray. > > 1) Is it true that array doesn't always return a copy except by > default? asarray says it contrasts with array in this regard. > Maybe copy=0 should be deprecated. > > 2) Is asarray is basically array with copy=0? > > 3) Is asanyarray basically array with copy=0 and subok=1? > > 4) Is there some sort of precedence table for conversions? To me > it looks like the most deeply nested lists are converted to arrays > first, numeric if they contain all numeric types, object > otherwise. I assume the algorithm then ascends up through the > hierarchy like traversing a binary tree in postorder? > > 5) All nesting must be to the same depth and the deepest nested > items must have the same length. > > 6) How is the difference between lists and "lists" determined, i.e., > > In [3]: array([list([1,2,3]),list([1,2])], dtype = object) > Out[3]: array([[1, 2, 3], [1, 2]], dtype=object) > > In [8]: array([array([1,2,3]),array([1,2])], dtype = object) > Out[8]: array([[1 2 3], [1 2]], dtype=object) > > > In [9]: array([1,2,3],[1,2]], dtype = object) > ------------------------------------------------------------ > File "<ipython console>", line 1 > array([1,2,3],[1,2]], dtype = object) > ^ > SyntaxError: invalid syntax > > Is the difference that list(...) and array(...) are passed as > functions (lazy evaluation), but a list is just a list? > > Sorry to be asking all these questions, but I would like to try > making the documentation be a bit of a reference. I am sure I will > have more questions ;) > > -Travis > > > And, voila, ragged arrays: > > In [9]: a = array([array([1,2,3]),array([1,2])], dtype = object) > > In [10]: a*2 > Out[10]: array([[2 4 6], [2 4]], dtype=object) > > In [11]: a + a > Out[11]: array([[2 4 6], [2 4]], dtype=object) Now I remember that this was my original motivation for futzing with the object-array constructor in the first place. So, now you get there only after an attempt to make a "rectangular" array first. -Travis |
From: Charles R H. <cha...@gm...> - 2006-09-07 19:22:04
|
On 9/7/06, Travis Oliphant <oli...@ie...> wrote: > > Charles R Harris wrote: > > On 9/6/06, *Charles R Harris* <cha...@gm... > > <mailto:cha...@gm...>> wrote: > > > > > > > > On 9/6/06, *Travis Oliphant* < oli...@ie... > > <mailto:oli...@ie...>> wrote: > > > > Charles R Harris wrote: > > > > > > Where is array at this point? > > Basically it supports the old Numeric behavior wherein object > > array's > > are treated as before *except* for when an error would have > > occurred > > previously when the "new behavior" kicks in. Anything that > > violates > > that is a bug needing to be fixed. > > > > This leaves the new object-array constructor used less > > often. It could > > be exported explicitly into an oarray constructor, but I'm not > > sure > > about the advantages of that approach. There are benefits to > > having > > object arrays constructed in the same way as other arrays. It > > turns out > > many people actually like that feature of Numeric, which is > > the reason I > > didn't go the route of numarray which pulled object arrays out. > > > > At this point, however, object arrays can even be part of > > records and so > > need to be an integral part of the data-type description. > > Pulling that > > out is not going to happen. A more intelligent object-array > > constructor, however, may be a useful tool. > > > > > > OK. I do have a couple of questions. Let me insert the docs for > > array and asarray : > > > > """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0) > > > > Return an array from object with the specified date-type. > > > > Inputs: > > object - an array, any object exposing the array interface, > any > > object whose __array__ method returns an array, or > any > > (nested) sequence. > > dtype - The desired data-type for the array. If not given, > > then > > the type will be determined as the minimum type > > required > > to hold the objects in the sequence. This > > argument can only > > be used to 'upcast' the array. For downcasting, > > use the > > .astype(t) method. > > copy - If true, then force a copy. Otherwise a copy will > > only occur > > if __array__ returns a copy, obj is a nested > > sequence, or > > a copy is needed to satisfy any of the other > > requirements > > order - Specify the order of the array. If order is 'C', > > then the > > array will be in C-contiguous order (last-index > > varies the > > fastest). If order is 'FORTRAN', then the > > returned array > > will be in Fortran-contiguous order (first-index > > varies the > > fastest). If order is None, then the returned > > array may > > be in either C-, or Fortran-contiguous order or even > > discontiguous. > > subok - If True, then sub-classes will be passed-through, > > otherwise > > the returned array will be forced to be a > > base-class array > > ndmin - Specifies the minimum number of dimensions that the > > resulting > > array should have. 1's will be pre-pended to the > > shape as > > needed to meet this requirement. > > > > """) > > > > asarray(a, dtype=None, order=None) > > Returns a as an array. > > > > Unlike array(), no copy is performed if a is already an array. > > Subclasses > > are converted to base class ndarray. > > > > 1) Is it true that array doesn't always return a copy except by > > default? asarray says it contrasts with array in this regard. > > Maybe copy=0 should be deprecated. > > > > 2) Is asarray is basically array with copy=0? > > > > 3) Is asanyarray basically array with copy=0 and subok=1? > > > > 4) Is there some sort of precedence table for conversions? To me > > it looks like the most deeply nested lists are converted to arrays > > first, numeric if they contain all numeric types, object > > otherwise. I assume the algorithm then ascends up through the > > hierarchy like traversing a binary tree in postorder? > > > > 5) All nesting must be to the same depth and the deepest nested > > items must have the same length. > > > > 6) How is the difference between lists and "lists" determined, i.e., > > > > In [3]: array([list([1,2,3]),list([1,2])], dtype = object) > > Out[3]: array([[1, 2, 3], [1, 2]], dtype=object) > > > > In [8]: array([array([1,2,3]),array([1,2])], dtype = object) > > Out[8]: array([[1 2 3], [1 2]], dtype=object) > > > > > > In [9]: array([1,2,3],[1,2]], dtype = object) > > ------------------------------------------------------------ > > File "<ipython console>", line 1 > > array([1,2,3],[1,2]], dtype = object) > > ^ > > SyntaxError: invalid syntax > > > > Is the difference that list(...) and array(...) are passed as > > functions (lazy evaluation), but a list is just a list? > > > > Sorry to be asking all these questions, but I would like to try > > making the documentation be a bit of a reference. I am sure I will > > have more questions ;) > > > > -Travis > > > > > > And, voila, ragged arrays: > > > > In [9]: a = array([array([1,2,3]),array([1,2])], dtype = object) > > > > In [10]: a*2 > > Out[10]: array([[2 4 6], [2 4]], dtype=object) > > > > In [11]: a + a > > Out[11]: array([[2 4 6], [2 4]], dtype=object) > > Now I remember that this was my original motivation for futzing with the > object-array constructor in the first place. So, now you get there only > after an attempt to make a "rectangular" array first. > > -Travis So is this intentional? In [24]: a = array([[],[],[]], dtype=object) In [25]: a.shape Out[25]: (3, 0) In [26]: a = array([], dtype=object) In [27]: a.shape Out[27]: (0,) One could argue that the first array should have shape (3,) Chuck |
From: Travis O. <oli...@ie...> - 2006-09-05 18:15:22
|
Matthew Brett wrote: > Hi, > > >> This is a result of PyArray_FromAny changing when object arrays are >> explicitly requested (which they are in this case --- although behind >> the scenes). >> > > Hmm - I think I am hitting a related bug/feature/surprising change in > behavior, which is showing up rather obscurely in a failure of the > scipy.io matlab loading tests: > > http://projects.scipy.org/scipy/scipy/ticket/258 > > Here's the change I wasn't expecting, present with current SVN: > > a = arange(2) > b = arange(1) > c = array([a, b], dtype=object) > c > -> > array([[0, 1], > [0, 0]], dtype=object) > > On a previous version of numpy (1.02b.dev2975) I get the answer I was expecting: > > array([[0], [0 1]], dtype=object) > This should now be fixed. The code was inappropriately not checking for dimensions when object arrays were being constructed. Now, it raises the appropriate error and then interprets it correctly using the extra object creation code. Users of scipy 0.5.1 will only have to upgrade NumPy to get the fix (the SciPy install won't have to be re-built). -Travis |
From: Travis O. <oli...@ee...> - 2006-09-07 22:15:45
|
Charles R Harris wrote: > > So is this intentional? > > In [24]: a = array([[],[],[]], dtype=object) > > In [25]: a.shape > Out[25]: (3, 0) > > In [26]: a = array([], dtype=object) > > In [27]: a.shape > Out[27]: (0,) > > One could argue that the first array should have shape (3,) > Yes, it's intentional because it's the old behavior of Numeric. And it follows the rule that object arrays don't do anything special unless the old technique of using [] as 'dimension delimiters' breaks down. > > And this doesn't look quite right: > > In [38]: a = array([[1],[2],[3]], dtype=object) > > In [39]: a.shape > Out[39]: (3, 1) > > In [40]: a = array([[1],[2,3],[4,5]], dtype=object) > > In [41]: a.shape > Out[41]: (3,) > Again, same reason as before. The first example works fine to construct a rectangular array of object arrays of dimension 2. The second only does if we limit the number of dimensions to 1. The rule is that array needs nested lists with the same number of dimensions unless you have object arrays. Then, the dimensionality will be determined by finding the largest number of dimensions possible for consistency of shape. -Travis |
From: Charles R H. <cha...@gm...> - 2006-09-06 21:08:13
|
On 9/5/06, Travis Oliphant <oli...@ie...> wrote: > > Matthew Brett wrote: > > Hi, > > > > > >> This is a result of PyArray_FromAny changing when object arrays are > >> explicitly requested (which they are in this case --- although behind > >> the scenes). > >> > > > > Hmm - I think I am hitting a related bug/feature/surprising change in > > behavior, which is showing up rather obscurely in a failure of the > > scipy.io matlab loading tests: > > > > http://projects.scipy.org/scipy/scipy/ticket/258 > > > > Here's the change I wasn't expecting, present with current SVN: > > > > a = arange(2) > > b = arange(1) > > c = array([a, b], dtype=object) > > c > > -> > > array([[0, 1], > > [0, 0]], dtype=object) > > > > On a previous version of numpy (1.02b.dev2975) I get the answer I was > expecting: > > > > array([[0], [0 1]], dtype=object) > > > > This should now be fixed. The code was inappropriately not checking for > dimensions when object arrays were being constructed. Now, it raises > the appropriate error and then interprets it correctly using the extra > object creation code. > > Users of scipy 0.5.1 will only have to upgrade NumPy to get the fix (the > SciPy install won't have to be re-built). > > -Travis Where is array at this point? I would like to review the documented behaviour and make modifications to the document string if required. What about Robert's idea of a separate constructor for object arrays? Is it something we could introduce on top of the current array constructor? I realize that if we restrict the current array constructor there might be compatibility problems with Numeric code, but introducing something like oarray as a shorthand for object arrays might incourage it's use. Robert also said that Numarray dealt with object arrays as a separate issue and I wonder what they did that we should think about. Chuck |
From: Travis O. <oli...@ie...> - 2006-09-06 23:39:41
|
Charles R Harris wrote: > > Where is array at this point? Basically it supports the old Numeric behavior wherein object array's are treated as before *except* for when an error would have occurred previously when the "new behavior" kicks in. Anything that violates that is a bug needing to be fixed. This leaves the new object-array constructor used less often. It could be exported explicitly into an oarray constructor, but I'm not sure about the advantages of that approach. There are benefits to having object arrays constructed in the same way as other arrays. It turns out many people actually like that feature of Numeric, which is the reason I didn't go the route of numarray which pulled object arrays out. At this point, however, object arrays can even be part of records and so need to be an integral part of the data-type description. Pulling that out is not going to happen. A more intelligent object-array constructor, however, may be a useful tool. -Travis |
From: A. M. A. <per...@gm...> - 2006-09-07 23:49:10
|
Maybe I should stay out of this, but it seems like constructing object arrays is complicated and involves a certain amount of guesswork on the part of Numeric. For example, if you do array([a,b,c]).shape(), the answer is normally (3,) unless a b and c happen to all be lists of the same length, at which point your array could have a much more complicated shape... but as the person who wrote "array([a,b,c])" it's tempting to assume that the result has shape (3,), only to discover subtle bugs much later. If we were writing an array-creation function from scratch, would there be any reason to include object-array creation in the same function as uniform array creation? It seems like a bad idea to me. If not, the problem is just compatibility with Numeric. Why not simply write a wrapper function in python that does Numeric-style guesswork, and put it in the compatibility modules? How much code will actually break? A. M. Archibald |
From: Charles R H. <cha...@gm...> - 2006-09-07 01:18:39
|
On 9/6/06, Travis Oliphant <oli...@ie...> wrote: > > Charles R Harris wrote: > > > > Where is array at this point? > Basically it supports the old Numeric behavior wherein object array's > are treated as before *except* for when an error would have occurred > previously when the "new behavior" kicks in. Anything that violates > that is a bug needing to be fixed. > > This leaves the new object-array constructor used less often. It could > be exported explicitly into an oarray constructor, but I'm not sure > about the advantages of that approach. There are benefits to having > object arrays constructed in the same way as other arrays. It turns out > many people actually like that feature of Numeric, which is the reason I > didn't go the route of numarray which pulled object arrays out. > > At this point, however, object arrays can even be part of records and so > need to be an integral part of the data-type description. Pulling that > out is not going to happen. A more intelligent object-array > constructor, however, may be a useful tool. OK. I do have a couple of questions. Let me insert the docs for array and asarray : """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0) Return an array from object with the specified date-type. Inputs: object - an array, any object exposing the array interface, any object whose __array__ method returns an array, or any (nested) sequence. dtype - The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to 'upcast' the array. For downcasting, use the .astype(t) method. copy - If true, then force a copy. Otherwise a copy will only occur if __array__ returns a copy, obj is a nested sequence, or a copy is needed to satisfy any of the other requirements order - Specify the order of the array. If order is 'C', then the array will be in C-contiguous order (last-index varies the fastest). If order is 'FORTRAN', then the returned array will be in Fortran-contiguous order (first-index varies the fastest). If order is None, then the returned array may be in either C-, or Fortran-contiguous order or even discontiguous. subok - If True, then sub-classes will be passed-through, otherwise the returned array will be forced to be a base-class array ndmin - Specifies the minimum number of dimensions that the resulting array should have. 1's will be pre-pended to the shape as needed to meet this requirement. """) asarray(a, dtype=None, order=None) Returns a as an array. Unlike array(), no copy is performed if a is already an array. Subclasses are converted to base class ndarray. 1) Is it true that array doesn't always return a copy except by default? asarray says it contrasts with array in this regard. Maybe copy=0 should be deprecated. 2) Is asarray is basically array with copy=0? 3) Is asanyarray basically array with copy=0 and subok=1? 4) Is there some sort of precedence table for conversions? To me it looks like the most deeply nested lists are converted to arrays first, numeric if they contain all numeric types, object otherwise. I assume the algorithm then ascends up through the hierarchy like traversing a binary tree in postorder? 5) All nesting must be to the same depth and the deepest nested items must have the same length. 6) How is the difference between lists and "lists" determined, i.e., In [3]: array([list([1,2,3]),list([1,2])], dtype = object) Out[3]: array([[1, 2, 3], [1, 2]], dtype=object) In [8]: array([array([1,2,3]),array([1,2])], dtype = object) Out[8]: array([[1 2 3], [1 2]], dtype=object) In [9]: array([1,2,3],[1,2]], dtype = object) ------------------------------------------------------------ File "<ipython console>", line 1 array([1,2,3],[1,2]], dtype = object) ^ SyntaxError: invalid syntax Is the difference that list(...) and array(...) are passed as functions (lazy evaluation), but a list is just a list? Sorry to be asking all these questions, but I would like to try making the documentation be a bit of a reference. I am sure I will have more questions ;) -Travis Chuck |
From: Travis O. <oli...@ie...> - 2006-09-07 07:03:07
|
Charles R Harris wrote: > OK. I do have a couple of questions. Let me insert the docs for array > and asarray : > > """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0) > > Return an array from object with the specified date-type. > > 1) Is it true that array doesn't always return a copy except by > default? asarray says it contrasts with array in this regard. Maybe > copy=0 should be deprecated. array is the main creation function. It is a loose wrapper around PyArray_fromAny. copy=0 means don't copy unless you have to. > 2) Is asarray is basically array with copy=0? Yes. > > 3) Is asanyarray basically array with copy=0 and subok=1? Yes. > > 4) Is there some sort of precedence table for conversions? To me it > looks like the most deeply nested lists are converted to arrays first, > numeric if they contain all numeric types, object otherwise. I assume > the algorithm then ascends up through the hierarchy like traversing a > binary tree in postorder? I'm not sure I understand what you mean. The discover-depth and discover-dimensions algorithm figures out what the shape should be and then recursive PySequence_GetItem and PySequence_SetItem is used to copy the information over to the ndarray from the nested sequence. > > 5) All nesting must be to the same depth and the deepest nested items > must have the same length. Yes, there are routines discover_depth and discover_dimensions that are the actual algorithm used. These are adapted from Numeric. > > 6) How is the difference between lists and "lists" determined, i.e., > > In [3]: array([list([1,2,3]),list([1,2])], dtype = object) > Out[3]: array([[1, 2, 3], [1, 2]], dtype=object) > > In [8]: array([array([1,2,3]),array([1,2])], dtype = object) > Out[8]: array([[1 2 3], [1 2]], dtype=object) > > > In [9]: array([1,2,3],[1,2]], dtype = object) > ------------------------------------------------------------ > File "<ipython console>", line 1 > array([1,2,3],[1,2]], dtype = object) > ^ > SyntaxError: invalid syntax I think this is just due to a missing [ in In [9]. There is no semantic difference between list([1,2,3]) and [1,2,3] (NumPy will see those things as exactly the same). > > Is the difference that list(...) and array(...) are passed as > functions (lazy evaluation), but a list is just a list? There is nothing like "lazy evaluation" going on. array([1,2,3]) is evaluated returning an object and array([1,2]) is evaluated returning an object and then the two are put into another object array. Equivalent code a = array([1,2,3]) b = array([1,2]) c = array([a,b],dtype=object) Thanks for all your help with documentation. It is very-much appreciated. -Travis |
From: Charles R H. <cha...@gm...> - 2006-09-07 01:52:45
|
On 9/6/06, Charles R Harris <cha...@gm...> wrote: > > > > On 9/6/06, Travis Oliphant <oli...@ie...> wrote: > > > > Charles R Harris wrote: > > > > > > Where is array at this point? > > Basically it supports the old Numeric behavior wherein object array's > > are treated as before *except* for when an error would have occurred > > previously when the "new behavior" kicks in. Anything that violates > > that is a bug needing to be fixed. > > > > This leaves the new object-array constructor used less often. It could > > be exported explicitly into an oarray constructor, but I'm not sure > > about the advantages of that approach. There are benefits to having > > object arrays constructed in the same way as other arrays. It turns out > > many people actually like that feature of Numeric, which is the reason I > > didn't go the route of numarray which pulled object arrays out. > > > > At this point, however, object arrays can even be part of records and so > > need to be an integral part of the data-type description. Pulling that > > out is not going to happen. A more intelligent object-array > > constructor, however, may be a useful tool. > > > OK. I do have a couple of questions. Let me insert the docs for array and > asarray : > > """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0) > > Return an array from object with the specified date-type. > > Inputs: > object - an array, any object exposing the array interface, any > object whose __array__ method returns an array, or any > (nested) sequence. > dtype - The desired data-type for the array. If not given, then > the type will be determined as the minimum type required > to hold the objects in the sequence. This argument can > only > be used to 'upcast' the array. For downcasting, use the > .astype(t) method. > copy - If true, then force a copy. Otherwise a copy will only > occur > if __array__ returns a copy, obj is a nested sequence, or > a copy is needed to satisfy any of the other requirements > order - Specify the order of the array. If order is 'C', then the > array will be in C-contiguous order (last-index varies the > fastest). If order is 'FORTRAN', then the returned array > will be in Fortran-contiguous order (first-index varies > the > fastest). If order is None, then the returned array may > be in either C-, or Fortran-contiguous order or even > discontiguous. > subok - If True, then sub-classes will be passed-through, otherwise > the returned array will be forced to be a base-class array > ndmin - Specifies the minimum number of dimensions that the > resulting > array should have. 1's will be pre-pended to the shape as > needed to meet this requirement. > > """) > > asarray(a, dtype=None, order=None) > Returns a as an array. > > Unlike array(), no copy is performed if a is already an array. > Subclasses > are converted to base class ndarray. > > 1) Is it true that array doesn't always return a copy except by default? > asarray says it contrasts with array in this regard. Maybe copy=0 should be > deprecated. > > 2) Is asarray is basically array with copy=0? > > 3) Is asanyarray basically array with copy=0 and subok=1? > > 4) Is there some sort of precedence table for conversions? To me it looks > like the most deeply nested lists are converted to arrays first, numeric if > they contain all numeric types, object otherwise. I assume the algorithm > then ascends up through the hierarchy like traversing a binary tree in > postorder? > > 5) All nesting must be to the same depth and the deepest nested items must > have the same length. > > 6) How is the difference between lists and "lists" determined, i.e., > > In [3]: array([list([1,2,3]),list([1,2])], dtype = object) > Out[3]: array([[1, 2, 3], [1, 2]], dtype=object) > > In [8]: array([array([1,2,3]),array([1,2])], dtype = object) > Out[8]: array([[1 2 3], [1 2]], dtype=object) > > > In [9]: array([1,2,3],[1,2]], dtype = object) > ------------------------------------------------------------ > File "<ipython console>", line 1 > array([1,2,3],[1,2]], dtype = object) > ^ > SyntaxError: invalid syntax > > Is the difference that list(...) and array(...) are passed as functions > (lazy evaluation), but a list is just a list? > > Sorry to be asking all these questions, but I would like to try making the > documentation be a bit of a reference. I am sure I will have more questions > ;) > > -Travis > > And, voila, ragged arrays: In [9]: a = array([array([1,2,3]),array([1,2])], dtype = object) In [10]: a*2 Out[10]: array([[2 4 6], [2 4]], dtype=object) In [11]: a + a Out[11]: array([[2 4 6], [2 4]], dtype=object) Chuck |
From: A. M. A. <per...@gm...> - 2006-09-07 02:18:40
|
On 06/09/06, Charles R Harris <cha...@gm...> wrote: > On 9/6/06, Charles R Harris <cha...@gm...> wrote: > > > order - Specify the order of the array. If order is 'C', then the > > array will be in C-contiguous order (last-index varies the > > fastest). If order is 'FORTRAN', then the returned array > > will be in Fortran-contiguous order (first-index varies > the > > fastest). If order is None, then the returned array may > > be in either C-, or Fortran-contiguous order or even > > discontiguous. This one's a bit complicated. If array() is passed a list of lists, there are two different orders that are relevant - the output order of the array, and the order used to interpret the input. I suppose that if L is a lost of lists, array(L)[2,3]==L[2][3], that is, in some sense the arrays are always logically C-ordered even if the underlying representation is different. Does it make sense to specify this somewhere in the docstring? At least it would be good to make it clear that the order parameter affects only the underlying storage format, and not the indexing of the array. A. M. Archibald |
From: Charles R H. <cha...@gm...> - 2006-09-07 19:29:53
|
On 9/7/06, Charles R Harris <cha...@gm...> wrote: > > > > On 9/7/06, Travis Oliphant <oli...@ie...> wrote: > > > > Charles R Harris wrote: > > > On 9/6/06, *Charles R Harris* <cha...@gm... > > > <mailto:cha...@gm... >> wrote: > > > > > > > > > > > > On 9/6/06, *Travis Oliphant* < oli...@ie... > > > <mailto: oli...@ie...>> wrote: > > > > > > Charles R Harris wrote: > > > > > > > > Where is array at this point? > > > Basically it supports the old Numeric behavior wherein object > > > array's > > > are treated as before *except* for when an error would have > > > occurred > > > previously when the "new behavior" kicks in. Anything that > > > violates > > > that is a bug needing to be fixed. > > > > > > This leaves the new object-array constructor used less > > > often. It could > > > be exported explicitly into an oarray constructor, but I'm not > > > > > sure > > > about the advantages of that approach. There are benefits to > > > having > > > object arrays constructed in the same way as other arrays. It > > > turns out > > > many people actually like that feature of Numeric, which is > > > the reason I > > > didn't go the route of numarray which pulled object arrays > > out. > > > > > > At this point, however, object arrays can even be part of > > > records and so > > > need to be an integral part of the data-type description. > > > Pulling that > > > out is not going to happen. A more intelligent object-array > > > constructor, however, may be a useful tool. > > > > > > > > > OK. I do have a couple of questions. Let me insert the docs for > > > array and asarray : > > > > > > """array(object, dtype=None, copy=1,order=None, > > subok=0,ndmin=0) > > > > > > Return an array from object with the specified date-type. > > > > > > Inputs: > > > object - an array, any object exposing the array interface, > > any > > > object whose __array__ method returns an array, or > > any > > > (nested) sequence. > > > dtype - The desired data-type for the array. If not given, > > > then > > > the type will be determined as the minimum type > > > required > > > to hold the objects in the sequence. This > > > argument can only > > > be used to 'upcast' the array. For downcasting, > > > use the > > > .astype(t) method. > > > copy - If true, then force a copy. Otherwise a copy will > > > only occur > > > if __array__ returns a copy, obj is a nested > > > sequence, or > > > a copy is needed to satisfy any of the other > > > requirements > > > order - Specify the order of the array. If order is 'C', > > > then the > > > array will be in C-contiguous order (last-index > > > varies the > > > fastest). If order is 'FORTRAN', then the > > > returned array > > > will be in Fortran-contiguous order (first-index > > > varies the > > > fastest). If order is None, then the returned > > > array may > > > be in either C-, or Fortran-contiguous order or > > even > > > discontiguous. > > > subok - If True, then sub-classes will be passed-through, > > > otherwise > > > the returned array will be forced to be a > > > base-class array > > > ndmin - Specifies the minimum number of dimensions that the > > > resulting > > > array should have. 1's will be pre-pended to the > > > shape as > > > needed to meet this requirement. > > > > > > """) > > > > > > asarray(a, dtype=None, order=None) > > > Returns a as an array. > > > > > > Unlike array(), no copy is performed if a is already an array. > > > Subclasses > > > are converted to base class ndarray. > > > > > > 1) Is it true that array doesn't always return a copy except by > > > default? asarray says it contrasts with array in this regard. > > > Maybe copy=0 should be deprecated. > > > > > > 2) Is asarray is basically array with copy=0? > > > > > > 3) Is asanyarray basically array with copy=0 and subok=1? > > > > > > 4) Is there some sort of precedence table for conversions? To me > > > it looks like the most deeply nested lists are converted to arrays > > > first, numeric if they contain all numeric types, object > > > otherwise. I assume the algorithm then ascends up through the > > > hierarchy like traversing a binary tree in postorder? > > > > > > 5) All nesting must be to the same depth and the deepest nested > > > items must have the same length. > > > > > > 6) How is the difference between lists and "lists" determined, i.e > > ., > > > > > > In [3]: array([list([1,2,3]),list([1,2])], dtype = object) > > > Out[3]: array([[1, 2, 3], [1, 2]], dtype=object) > > > > > > In [8]: array([array([1,2,3]),array([1,2])], dtype = object) > > > Out[8]: array([[1 2 3], [1 2]], dtype=object) > > > > > > > > > In [9]: array([1,2,3],[1,2]], dtype = object) > > > ------------------------------------------------------------ > > > File "<ipython console>", line 1 > > > array([1,2,3],[1,2]], dtype = object) > > > ^ > > > SyntaxError: invalid syntax > > > > > > Is the difference that list(...) and array(...) are passed as > > > functions (lazy evaluation), but a list is just a list? > > > > > > Sorry to be asking all these questions, but I would like to try > > > making the documentation be a bit of a reference. I am sure I will > > > have more questions ;) > > > > > > -Travis > > > > > > > > > And, voila, ragged arrays: > > > > > > In [9]: a = array([array([1,2,3]),array([1,2])], dtype = object) > > > > > > In [10]: a*2 > > > Out[10]: array([[2 4 6], [2 4]], dtype=object) > > > > > > In [11]: a + a > > > Out[11]: array([[2 4 6], [2 4]], dtype=object) > > > > Now I remember that this was my original motivation for futzing with the > > > > object-array constructor in the first place. So, now you get there only > > after an attempt to make a "rectangular" array first. > > > > -Travis > > > So is this intentional? > > In [24]: a = array([[],[],[]], dtype=object) > > In [25]: a.shape > Out[25]: (3, 0) > > In [26]: a = array([], dtype=object) > > In [27]: a.shape > Out[27]: (0,) > > One could argue that the first array should have shape (3,) > And this doesn't look quite right: In [38]: a = array([[1],[2],[3]], dtype=object) In [39]: a.shape Out[39]: (3, 1) In [40]: a = array([[1],[2,3],[4,5]], dtype=object) In [41]: a.shape Out[41]: (3,) Chuck |
From: Charles R H. <cha...@gm...> - 2006-09-07 22:48:54
|
On 9/7/06, Travis Oliphant <oli...@ee...> wrote: > > Charles R Harris wrote: > > > > > So is this intentional? > > > > In [24]: a = array([[],[],[]], dtype=object) > > > > In [25]: a.shape > > Out[25]: (3, 0) > > > > In [26]: a = array([], dtype=object) > > > > In [27]: a.shape > > Out[27]: (0,) > > > > One could argue that the first array should have shape (3,) > > > Yes, it's intentional because it's the old behavior of Numeric. And it > follows the rule that object arrays don't do anything special unless the > old technique of using [] as 'dimension delimiters' breaks down. > > > > > And this doesn't look quite right: > > > > In [38]: a = array([[1],[2],[3]], dtype=object) > > > > In [39]: a.shape > > Out[39]: (3, 1) > > > > In [40]: a = array([[1],[2,3],[4,5]], dtype=object) > > > > In [41]: a.shape > > Out[41]: (3,) > > > > Again, same reason as before. The first example works fine to construct > a rectangular array of object arrays of dimension 2. The second only > does if we limit the number of dimensions to 1. > > The rule is that array needs nested lists with the same number of > dimensions unless you have object arrays. Then, the dimensionality will > be determined by finding the largest number of dimensions possible for > consistency of shape. So there is a 'None' trick: In [93]: a = array([[[2]], None], dtype=object) In [94]: a[0] Out[94]: [[2]] I wonder if it wouldn't be useful to have a 'depth' keyword. Thus depth=None is current behavior, but array([], depth=0) would produce a zero dimensional array containing an empty list. Although I notice from playing with dictionaries that a zero dimensional array containing a dictionary isn't very useful. array([[],[]], depth=1) would produce a one dimensional array containing two empty lists, etc. I can see it is difficult to get something truely general with the current syntax without a little bit of extra information. Another question, what property must an object possess to be a container type argument in array? There are sequence type objects, and array type objects. Are there more or is everything else treated as an object? Chuck |
From: Matthew B. <mat...@gm...> - 2006-09-14 00:13:54
|
Hi, > For example, if you do array([a,b,c]).shape(), the answer is normally > (3,) unless a b and c happen to all be lists of the same length, at > which point your array could have a much more complicated shape... but > as the person who wrote "array([a,b,c])" it's tempting to assume that > the result has shape (3,), only to discover subtle bugs much later. Very much agree with this. > If we were writing an array-creation function from scratch, would > there be any reason to include object-array creation in the same > function as uniform array creation? It seems like a bad idea to me. > > If not, the problem is just compatibility with Numeric. Why not simply > write a wrapper function in python that does Numeric-style guesswork, > and put it in the compatibility modules? How much code will actually > break? Can I encourage any more comments? This suggestion seems very sensible to me, and I guess this is our very last chance to change this. The current behavior does seem to violate least surprise - at least to my eye. Best, Matthew |
From: Charles R H. <cha...@gm...> - 2006-09-14 01:07:42
|
On 9/13/06, Matthew Brett <mat...@gm...> wrote: > > Hi, > > > For example, if you do array([a,b,c]).shape(), the answer is normally > > (3,) unless a b and c happen to all be lists of the same length, at > > which point your array could have a much more complicated shape... but > > as the person who wrote "array([a,b,c])" it's tempting to assume that > > the result has shape (3,), only to discover subtle bugs much later. > > Very much agree with this. > > > If we were writing an array-creation function from scratch, would > > there be any reason to include object-array creation in the same > > function as uniform array creation? It seems like a bad idea to me. > > > > If not, the problem is just compatibility with Numeric. Why not simply > > write a wrapper function in python that does Numeric-style guesswork, > > and put it in the compatibility modules? How much code will actually > > break? > > Can I encourage any more comments? This suggestion seems very > sensible to me, and I guess this is our very last chance to change > this. The current behavior does seem to violate least surprise - at > least to my eye. I've been thinking about how to write a new constructor for objects. Because array has been at the base of numpy for many years I think it is too late to change it now, but perhaps a new and more predictable constructor for objects may eventually displace it. The main problem in constructing arrays of objects is more information needs to be supplied because the user's intention can't be reliably deduced from the current syntax. That said, I have no idea how widespread the use of object arrays is and so don't know how much it really matters. I don't use them much myself. Chuck |
From: Christopher B. <Chr...@no...> - 2006-09-14 16:29:15
|
Charles R Harris wrote: >> > Why not simply >> > write a wrapper function in python that does Numeric-style guesswork, >> > and put it in the compatibility modules? >> Can I encourage any more comments? +1 > The main problem in constructing arrays > of objects is more information needs to be supplied because the user's > intention can't be reliably deduced from the current syntax. I wrote about this a bit early in this conversation, and as I thought about it. I'm not sure it's possible _- you could specify a rank, or a shape, but in general, there wouldn't be a unique way to translate an given hierarchy of sequences into a particular shape: imagine four levels of nested lists, asked to turn into a rank-3 array. This is why it may be best to simply recommend that people create an empty array of the shape they need, then put the objects into it - it's the only way to construct what you need reliably. However, an object array constructor that take a rank as an argument might well work for most cases, as long as there is a clearly documented and consistent way to handle extra levels of sequences: perhaps specify that any extra levels of nesting always go to the last dimension (or the first). That being said, it's still dangerous -- what levels of nesting are allowed would depend on which sequences *happen* to be the same size. Also the code would be a pain to write! I wonder how often people need to use objects arrays when they don't know when writing the code what shape they need? this is making me think that maybe all we really need is a little syntactic sugar for creating empty object arrays: numpy.ObjectArray(shape) Not much different than: numpy.empty(shape, dtype=numpy.object) but a little cleaner an more obvious to new users that are primarily interested in object arrays -- analogous to ones() and zeros() > That said, I > have no idea how widespread the use of object arrays is and so don't know > how much it really matters. If we ever get nd-arrays into the standard lib (or want to see wider use of them in any case), I think that object arrays are critical. Right now, people think they don't have a use for numpy if they aren't doing serious number crunching -- it's seen mostly as a way to speed up computations on lots of numbers. However, I think nd-arrays have LOTS of other applications, for anything where the data fits well in to a "rectangular" data structure. n-d slicing is a wonderful thing! As numpy gets wider use -- object arrays will be a very big draw. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |