Thread: [Numpy-discussion] Problem with concatenate and object arrays

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion

[Numpy-discussion] Problem with concatenate and object arrays

From: Fernando P. <Fer...@co...> - 2006-09-03 19:14:31

Hi all,

I'm wondering if the following difference in behavior of object arrays should 
be considered a bug.  Let a and b be:

In [21]: a = [0,1]

In [22]: b = [ None, None]

If we concatenate a with an empty list, it works:

In [23]: numpy.concatenate(([],a))
Out[23]: array([0, 1])

But not so for b:

In [24]: numpy.concatenate(([],b))
---------------------------------------------------------------------------
exceptions.ValueError                                Traceback (most recent 
call last)

/home/fperez/<ipython console>

ValueError: 0-d arrays can't be concatenated


This behavior changed recently (it used to work with r2788), and I realize 
it's probably part of all the reworkings of the object arrays which have been 
discussed on the list, and all of whose details I have to admit I haven't 
followed.  But this behavior strikes me as a bit inconsistent, since 
concatenation with a non-empty object array works fine:

In [26]: numpy.concatenate(([None],b))
Out[26]: array([None, None, None], dtype=object)

This is biting us in some code which keeps object arrays, because when 
operations of the kind

N.concatenate((some_list_of_objects[:nn],other_object_array))

are taken and nn happens to be 0, the code just explodes.  In our case, the 
variable nn is a runtime computed quantity that comes from a numerical 
algorithm, for which 0 is a perfectly reasonable value.

Are we just misusing things and is there a reasonable alternative, or should 
this be considered a numpy bug?  The r2788 behavior was certainly a lot less 
surprising as far as our code was concerned.

I realize that one alternative is to wrap everything into arrays:

N.concatenate((N.asarray(some_list_of_objects[:nn]),other_object_array))

Is this the only solution moving forward, or could the previous behavior be 
restored without breaking other areas of the new code/design?

Thanks for any input,

f

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Charles R H. <cha...@gm...> - 2006-09-03 20:43:44

On 9/3/06, Fernando Perez <Fer...@co...> wrote:
>
> Hi all,
>
> I'm wondering if the following difference in behavior of object arrays
> should
> be considered a bug.  Let a and b be:
>
> In [21]: a = [0,1]
>
> In [22]: b = [ None, None]
>
> If we concatenate a with an empty list, it works:
>
> In [23]: numpy.concatenate(([],a))
> Out[23]: array([0, 1])
>
> But not so for b:
>
> In [24]: numpy.concatenate(([],b))
>
> ---------------------------------------------------------------------------
> exceptions.ValueError                                Traceback (most
> recent
> call last)
>
> /home/fperez/<ipython console>
>
> ValueError: 0-d arrays can't be concatenated


I think it's propably a bug:

>>> concatenate((array([]),b))
array([None, None], dtype=object)

Chuck

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Robert K. <rob...@gm...> - 2006-09-03 21:54:17

Charles R Harris wrote:
> On 9/3/06, *Fernando Perez* <Fer...@co... 
> <mailto:Fer...@co...>> wrote:
> 
>     Hi all,
> 
>     I'm wondering if the following difference in behavior of object
>     arrays should
>     be considered a bug.  Let a and b be:
> 
>     In [21]: a = [0,1]
> 
>     In [22]: b = [ None, None]
> 
>     If we concatenate a with an empty list, it works:
> 
>     In [23]: numpy.concatenate(([],a))
>     Out[23]: array([0, 1])
> 
>     But not so for b:
> 
>     In [24]: numpy.concatenate(([],b))
>     ---------------------------------------------------------------------------
>     exceptions.ValueError                                 Traceback
>     (most recent
>     call last)
> 
>     /home/fperez/<ipython console>
> 
>     ValueError: 0-d arrays can't be concatenated
> 
> 
> I think it's propably a bug:
> 
>  >>> concatenate((array([]),b))
> array([None, None], dtype=object)

Well, if you can fix it without breaking anything else, then it's a bug.

However, I would suggest that a rule of thumb for using object arrays is to 
always be explicit. Never rely on automatic conversion from Python containers to 
object arrays. Since Python containers are also objects, it is usually ambiguous 
what the user meant.

I kind of liked numarray's choice to move the object array into a separate 
constructor. I think that gave them some flexibility to choose different syntax 
and semantics from the generic array() constructor. Since constructing object 
arrays is so different from constructing numeric arrays, I think that difference 
is warranted.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Fernando P. <fpe...@gm...> - 2006-09-04 19:18:51

On 9/3/06, Robert Kern <rob...@gm...> wrote:

> > I think it's propably a bug:
> >
> >  >>> concatenate((array([]),b))
> > array([None, None], dtype=object)
>
> Well, if you can fix it without breaking anything else, then it's a bug.
>
> However, I would suggest that a rule of thumb for using object arrays is to
> always be explicit. Never rely on automatic conversion from Python containers to
> object arrays. Since Python containers are also objects, it is usually ambiguous
> what the user meant.
>
> I kind of liked numarray's choice to move the object array into a separate
> constructor. I think that gave them some flexibility to choose different syntax
> and semantics from the generic array() constructor. Since constructing object
> arrays is so different from constructing numeric arrays, I think that difference
> is warranted.

This is something that should probably be sorted out before 1.0 is
out.  IMHO, the current behavior is a bit too full of subtle pitfalls
to be a good long-term solution, and I think that predictability
trumps convenience for a good API.  I know that N.array() is already
very complex; perhaps the idea of moving all object array construction
into a separate function would be a long-term win.

I think that object arrays are actually very important for numpy: they
provide the bridge between pure numerical, Fortran-like computing and
the richer world of Python datatypes and complex objects.  But it's
important to acknowledge this bridge character: they connect into a
world where the basic assumptions of homogeneity of numpy arrays don't
apply anymore.  I'd be +1 on forcing this acknowledgement by having a
separate N.oarray constructor, accessible via the dtype flag to
N.array as well, but /without/ N.array trying to invoke it
automatically by guessing the contents of what it was fed.

The downside of this approach is that much of the code that
'magically' just works with N.array(foo) today, would now break.  I'm
becoming almost of the opinion that the code is broken already, it
just hasn't failed yet :)

Over time, I've become more and more paranoid of what constructors
(and factory-type functions that build objects) do with their inputs,
how strongly they validate them, and how explicit they require their
users to be about their intent.  While I'm a big fan of Python's
duck-typing, I've also learned (the hard way) that in larger
codebases, the place to be very strict about input validation is
object constructors.  It's easy to let garbage seep into an object by
not validating a constructor input, and to later (often MUCH later)
have your code bizarrely explode with an unrecognizable exception,
because that little piece of garbage was fed to some third-party code
which was expecting something else.  If I've understood history
correctly, some of the motivations behind Enthought's Traits are of a
similar nature.

For now I've dealt with our private problem that spurred my original
posting.  But I think this issue is worth clarifying for numpy, before
1.0 paints us into a backwards-compatibility corner with a fairly
fundamental datatype and constructor.

Regards,

f

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Travis O. <oli...@ie...> - 2006-09-04 22:33:53

Fernando Perez wrote:
> Hi all,
>
> I'm wondering if the following difference in behavior of object arrays should 
> be considered a bug.  Let a and b be:
>
> In [21]: a = [0,1]
>
> In [22]: b = [ None, None]
>
> If we concatenate a with an empty list, it works:
>
> In [23]: numpy.concatenate(([],a))
> Out[23]: array([0, 1])
>
> But not so for b:
>
> In [24]: numpy.concatenate(([],b))
> ---------------------------------------------------------------------------
> exceptions.ValueError                                Traceback (most recent 
> call last)
>
> /home/fperez/<ipython console>
>
> ValueError: 0-d arrays can't be concatenated
>   

This is a result of PyArray_FromAny changing when object arrays are 
explicitly requested (which they are in this case --- although behind 
the scenes). 

I decided to revert to the previous behavior and only use the 
Object_FromNestedLists code when an error occurs and the user explicitly 
requested an object array.  

The downside is that you can not place empty lists (or tuples) as 
objects in an object-array construct.  as you could before.   Given the 
trouble people had with the "feature,"  it seems wise to use it only 
when previous code would have raised an error.

-Travis

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Matthew B. <mat...@gm...> - 2006-09-05 13:35:02

Hi,

> This is a result of PyArray_FromAny changing when object arrays are
> explicitly requested (which they are in this case --- although behind
> the scenes).

Hmm - I think I am hitting a related bug/feature/surprising change in
behavior, which is showing up rather obscurely in a failure of the
scipy.io matlab loading tests:

http://projects.scipy.org/scipy/scipy/ticket/258

Here's the change I wasn't expecting, present with current SVN:

a = arange(2)
b = arange(1)
c = array([a, b], dtype=object)
c
->
array([[0, 1],
       [0, 0]], dtype=object)

On a previous version of numpy (1.02b.dev2975) I get the answer I was expecting:

array([[0], [0 1]], dtype=object)

Best,

Matthew

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Travis O. <oli...@ie...> - 2006-09-05 16:44:26

Matthew Brett wrote:
> Hi,
>
>   
>> This is a result of PyArray_FromAny changing when object arrays are
>> explicitly requested (which they are in this case --- although behind
>> the scenes).
>>     
>
> Hmm - I think I am hitting a related bug/feature/surprising change in
> behavior, which is showing up rather obscurely in a failure of the
> scipy.io matlab loading tests:
>
> http://projects.scipy.org/scipy/scipy/ticket/258
>
> Here's the change I wasn't expecting, present with current SVN:
>
> a = arange(2)
> b = arange(1)
> c = array([a, b], dtype=object)
> c
> ->
> array([[0, 1],
>        [0, 0]], dtype=object)
>
> On a previous version of numpy (1.02b.dev2975) I get the answer I was expecting:
>
> array([[0], [0 1]], dtype=object)
>   

Grrr..    Object arrays are very hard to get right.  I have no idea why 
this is happening, but I'll look into it.   I think it's the bug that 
led me to put in the special-case object-array handling in the first 
place.   Now, that special-case object-array handling is only done on an 
error condition, I need to fix this right and raise an inconsistent 
shape error.  It will probably help with the TypeError messages that are 
currently raised in this situation with other types as well.

-Travis

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Travis O. <oli...@ie...> - 2006-09-07 06:54:46

Charles R Harris wrote:
> On 9/6/06, *Charles R Harris* <cha...@gm... 
> <mailto:cha...@gm...>> wrote:
>
>
>
>     On 9/6/06, *Travis Oliphant* < oli...@ie...
>     <mailto:oli...@ie...>> wrote:
>
>         Charles R Harris wrote:
>         >
>         > Where is array at this point?
>         Basically it supports the old Numeric behavior wherein object
>         array's
>         are treated as before *except* for when an error would have
>         occurred
>         previously when the "new behavior" kicks in.  Anything that
>         violates
>         that is a bug needing to be fixed.
>
>         This leaves the new object-array constructor used less
>         often.  It could
>         be exported explicitly into an oarray constructor, but I'm not
>         sure
>         about the advantages of that approach.   There are benefits to
>         having
>         object arrays constructed in the same way as other arrays.  It
>         turns out
>         many people actually like that feature of Numeric, which is
>         the reason I
>         didn't go the route of numarray which pulled object arrays out.
>
>         At this point, however, object arrays can even be part of
>         records and so
>         need to be an integral part of the data-type description.  
>         Pulling that
>         out is not going to happen.  A more intelligent object-array
>         constructor, however, may be a useful tool. 
>
>
>     OK. I do have a couple of questions. Let me insert the docs for
>     array and asarray :
>
>         """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0)
>
>         Return an array from object with the specified date-type.
>
>         Inputs:
>           object - an array, any object exposing the array interface, any
>                     object whose __array__ method returns an array, or any
>                     (nested) sequence.
>           dtype  - The desired data-type for the array.  If not given,
>     then
>                     the type will be determined as the minimum type
>     required
>                     to hold the objects in the sequence.  This
>     argument can only
>                     be used to 'upcast' the array.  For downcasting,
>     use the
>                     .astype(t) method.
>           copy   - If true, then force a copy.  Otherwise a copy will
>     only occur
>                     if __array__ returns a copy, obj is a nested
>     sequence, or
>                     a copy is needed to satisfy any of the other
>     requirements
>           order  - Specify the order of the array.  If order is 'C',
>     then the
>                     array will be in C-contiguous order (last-index
>     varies the
>                     fastest).  If order is 'FORTRAN', then the
>     returned array
>                     will be in Fortran-contiguous order (first-index
>     varies the
>                     fastest).  If order is None, then the returned
>     array may
>                     be in either C-, or Fortran-contiguous order or even
>                     discontiguous.
>           subok  - If True, then sub-classes will be passed-through,
>     otherwise
>                     the returned array will be forced to be a
>     base-class array
>           ndmin  - Specifies the minimum number of dimensions that the
>     resulting
>                     array should have.  1's will be pre-pended to the
>     shape as
>                     needed to meet this requirement.
>
>         """)
>
>     asarray(a, dtype=None, order=None)
>         Returns a as an array.
>
>         Unlike array(), no copy is performed if a is already an array.
>     Subclasses
>         are converted to base class ndarray.
>
>     1) Is it true that array doesn't always return a copy except by
>     default? asarray says it contrasts with array in this regard.
>     Maybe copy=0 should be deprecated.
>
>     2) Is asarray is basically array with copy=0?
>
>     3) Is asanyarray basically array with copy=0 and subok=1?
>
>     4) Is there some sort of precedence table for conversions? To me
>     it looks like the most deeply nested lists are converted to arrays
>     first, numeric if they contain all numeric types, object
>     otherwise. I assume the algorithm then ascends up through the
>     hierarchy like traversing a binary tree in postorder?
>
>     5) All nesting must be to the same depth and the deepest nested
>     items must have the same length.
>
>     6) How is the difference between lists and "lists" determined, i.e.,
>
>     In [3]: array([list([1,2,3]),list([1,2])], dtype = object)
>     Out[3]: array([[1, 2, 3], [1, 2]], dtype=object)
>
>     In [8]: array([array([1,2,3]),array([1,2])], dtype = object)
>     Out[8]: array([[1 2 3], [1 2]], dtype=object)
>
>
>     In [9]: array([1,2,3],[1,2]], dtype = object)
>     ------------------------------------------------------------
>        File "<ipython console>", line 1
>          array([1,2,3],[1,2]], dtype = object)
>                             ^
>     SyntaxError: invalid syntax
>
>     Is the difference that list(...) and array(...) are passed as
>     functions (lazy evaluation), but a list is just a list?
>
>     Sorry to be asking all these questions, but I would like to try
>     making the documentation be a bit of a reference. I am sure I will
>     have more questions ;)
>
>         -Travis
>
>
> And, voila, ragged arrays:
>
> In [9]: a = array([array([1,2,3]),array([1,2])], dtype = object)
>
> In [10]: a*2
> Out[10]: array([[2 4 6], [2 4]], dtype=object)
>
> In [11]: a + a
> Out[11]: array([[2 4 6], [2 4]], dtype=object)

Now I remember that this was my original motivation for futzing with the 
object-array constructor in the first place.  So, now you get there only 
after an attempt to make a "rectangular" array first.

-Travis

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Charles R H. <cha...@gm...> - 2006-09-07 19:22:04

On 9/7/06, Travis Oliphant <oli...@ie...> wrote:
>
> Charles R Harris wrote:
> > On 9/6/06, *Charles R Harris* <cha...@gm...
> > <mailto:cha...@gm...>> wrote:
> >
> >
> >
> >     On 9/6/06, *Travis Oliphant* < oli...@ie...
> >     <mailto:oli...@ie...>> wrote:
> >
> >         Charles R Harris wrote:
> >         >
> >         > Where is array at this point?
> >         Basically it supports the old Numeric behavior wherein object
> >         array's
> >         are treated as before *except* for when an error would have
> >         occurred
> >         previously when the "new behavior" kicks in.  Anything that
> >         violates
> >         that is a bug needing to be fixed.
> >
> >         This leaves the new object-array constructor used less
> >         often.  It could
> >         be exported explicitly into an oarray constructor, but I'm not
> >         sure
> >         about the advantages of that approach.   There are benefits to
> >         having
> >         object arrays constructed in the same way as other arrays.  It
> >         turns out
> >         many people actually like that feature of Numeric, which is
> >         the reason I
> >         didn't go the route of numarray which pulled object arrays out.
> >
> >         At this point, however, object arrays can even be part of
> >         records and so
> >         need to be an integral part of the data-type description.
> >         Pulling that
> >         out is not going to happen.  A more intelligent object-array
> >         constructor, however, may be a useful tool.
> >
> >
> >     OK. I do have a couple of questions. Let me insert the docs for
> >     array and asarray :
> >
> >         """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0)
> >
> >         Return an array from object with the specified date-type.
> >
> >         Inputs:
> >           object - an array, any object exposing the array interface,
> any
> >                     object whose __array__ method returns an array, or
> any
> >                     (nested) sequence.
> >           dtype  - The desired data-type for the array.  If not given,
> >     then
> >                     the type will be determined as the minimum type
> >     required
> >                     to hold the objects in the sequence.  This
> >     argument can only
> >                     be used to 'upcast' the array.  For downcasting,
> >     use the
> >                     .astype(t) method.
> >           copy   - If true, then force a copy.  Otherwise a copy will
> >     only occur
> >                     if __array__ returns a copy, obj is a nested
> >     sequence, or
> >                     a copy is needed to satisfy any of the other
> >     requirements
> >           order  - Specify the order of the array.  If order is 'C',
> >     then the
> >                     array will be in C-contiguous order (last-index
> >     varies the
> >                     fastest).  If order is 'FORTRAN', then the
> >     returned array
> >                     will be in Fortran-contiguous order (first-index
> >     varies the
> >                     fastest).  If order is None, then the returned
> >     array may
> >                     be in either C-, or Fortran-contiguous order or even
> >                     discontiguous.
> >           subok  - If True, then sub-classes will be passed-through,
> >     otherwise
> >                     the returned array will be forced to be a
> >     base-class array
> >           ndmin  - Specifies the minimum number of dimensions that the
> >     resulting
> >                     array should have.  1's will be pre-pended to the
> >     shape as
> >                     needed to meet this requirement.
> >
> >         """)
> >
> >     asarray(a, dtype=None, order=None)
> >         Returns a as an array.
> >
> >         Unlike array(), no copy is performed if a is already an array.
> >     Subclasses
> >         are converted to base class ndarray.
> >
> >     1) Is it true that array doesn't always return a copy except by
> >     default? asarray says it contrasts with array in this regard.
> >     Maybe copy=0 should be deprecated.
> >
> >     2) Is asarray is basically array with copy=0?
> >
> >     3) Is asanyarray basically array with copy=0 and subok=1?
> >
> >     4) Is there some sort of precedence table for conversions? To me
> >     it looks like the most deeply nested lists are converted to arrays
> >     first, numeric if they contain all numeric types, object
> >     otherwise. I assume the algorithm then ascends up through the
> >     hierarchy like traversing a binary tree in postorder?
> >
> >     5) All nesting must be to the same depth and the deepest nested
> >     items must have the same length.
> >
> >     6) How is the difference between lists and "lists" determined, i.e.,
> >
> >     In [3]: array([list([1,2,3]),list([1,2])], dtype = object)
> >     Out[3]: array([[1, 2, 3], [1, 2]], dtype=object)
> >
> >     In [8]: array([array([1,2,3]),array([1,2])], dtype = object)
> >     Out[8]: array([[1 2 3], [1 2]], dtype=object)
> >
> >
> >     In [9]: array([1,2,3],[1,2]], dtype = object)
> >     ------------------------------------------------------------
> >        File "<ipython console>", line 1
> >          array([1,2,3],[1,2]], dtype = object)
> >                             ^
> >     SyntaxError: invalid syntax
> >
> >     Is the difference that list(...) and array(...) are passed as
> >     functions (lazy evaluation), but a list is just a list?
> >
> >     Sorry to be asking all these questions, but I would like to try
> >     making the documentation be a bit of a reference. I am sure I will
> >     have more questions ;)
> >
> >         -Travis
> >
> >
> > And, voila, ragged arrays:
> >
> > In [9]: a = array([array([1,2,3]),array([1,2])], dtype = object)
> >
> > In [10]: a*2
> > Out[10]: array([[2 4 6], [2 4]], dtype=object)
> >
> > In [11]: a + a
> > Out[11]: array([[2 4 6], [2 4]], dtype=object)
>
> Now I remember that this was my original motivation for futzing with the
> object-array constructor in the first place.  So, now you get there only
> after an attempt to make a "rectangular" array first.
>
> -Travis


So is this intentional?

In [24]: a = array([[],[],[]], dtype=object)

In [25]: a.shape
Out[25]: (3, 0)

In [26]: a = array([], dtype=object)

In [27]: a.shape
Out[27]: (0,)

One could argue that the first array should have shape (3,)

Chuck

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Travis O. <oli...@ie...> - 2006-09-05 18:15:22

Matthew Brett wrote:
> Hi,
>
>   
>> This is a result of PyArray_FromAny changing when object arrays are
>> explicitly requested (which they are in this case --- although behind
>> the scenes).
>>     
>
> Hmm - I think I am hitting a related bug/feature/surprising change in
> behavior, which is showing up rather obscurely in a failure of the
> scipy.io matlab loading tests:
>
> http://projects.scipy.org/scipy/scipy/ticket/258
>
> Here's the change I wasn't expecting, present with current SVN:
>
> a = arange(2)
> b = arange(1)
> c = array([a, b], dtype=object)
> c
> ->
> array([[0, 1],
>        [0, 0]], dtype=object)
>
> On a previous version of numpy (1.02b.dev2975) I get the answer I was expecting:
>
> array([[0], [0 1]], dtype=object)
>   

This should now be fixed.  The code was inappropriately not checking for 
dimensions when object arrays were being constructed.  Now, it raises 
the appropriate error and then interprets it correctly using the extra 
object creation code.  

Users of scipy 0.5.1 will only have to upgrade NumPy to get the fix (the 
SciPy install won't have to be re-built).

-Travis

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Travis O. <oli...@ee...> - 2006-09-07 22:15:45

Charles R Harris wrote:

>
>     So is this intentional?
>
>     In [24]: a = array([[],[],[]], dtype=object)
>
>     In [25]: a.shape
>     Out[25]: (3, 0)
>
>     In [26]: a = array([], dtype=object)
>
>     In [27]: a.shape
>     Out[27]: (0,)
>      
>     One could argue that the first array should have shape (3,)
>
Yes, it's intentional because it's the old behavior of Numeric.  And it 
follows the rule that object arrays don't do anything special unless the 
old technique of using [] as 'dimension delimiters' breaks down.

>
> And this doesn't look quite right:
>
> In [38]: a = array([[1],[2],[3]], dtype=object)
>
> In [39]: a.shape
> Out[39]: (3, 1)
>
> In [40]: a = array([[1],[2,3],[4,5]], dtype=object)
>
> In [41]: a.shape
> Out[41]: (3,)
>  

Again, same reason as before.  The first example works fine to construct 
a rectangular array of object arrays of dimension 2.  The second only 
does if we limit the number of dimensions to 1.

The rule is that array needs nested lists with the same number of 
dimensions unless you have object arrays.  Then, the dimensionality will 
be determined by finding the largest number of dimensions possible for 
consistency of shape.

-Travis

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Charles R H. <cha...@gm...> - 2006-09-06 21:08:13

On 9/5/06, Travis Oliphant <oli...@ie...> wrote:
>
> Matthew Brett wrote:
> > Hi,
> >
> >
> >> This is a result of PyArray_FromAny changing when object arrays are
> >> explicitly requested (which they are in this case --- although behind
> >> the scenes).
> >>
> >
> > Hmm - I think I am hitting a related bug/feature/surprising change in
> > behavior, which is showing up rather obscurely in a failure of the
> > scipy.io matlab loading tests:
> >
> > http://projects.scipy.org/scipy/scipy/ticket/258
> >
> > Here's the change I wasn't expecting, present with current SVN:
> >
> > a = arange(2)
> > b = arange(1)
> > c = array([a, b], dtype=object)
> > c
> > ->
> > array([[0, 1],
> >        [0, 0]], dtype=object)
> >
> > On a previous version of numpy (1.02b.dev2975) I get the answer I was
> expecting:
> >
> > array([[0], [0 1]], dtype=object)
> >
>
> This should now be fixed.  The code was inappropriately not checking for
> dimensions when object arrays were being constructed.  Now, it raises
> the appropriate error and then interprets it correctly using the extra
> object creation code.
>
> Users of scipy 0.5.1 will only have to upgrade NumPy to get the fix (the
> SciPy install won't have to be re-built).
>
> -Travis

Where is array at this point? I would like to review the documented
behaviour and make modifications to the document string if required. What
about Robert's idea of a separate constructor for object arrays? Is it
something we could introduce on top of the current array constructor? I
realize that if we restrict the current array constructor there might be
compatibility problems with Numeric code, but introducing something like
oarray as a shorthand for object arrays might incourage it's use. Robert
also said that Numarray dealt with object arrays as a separate issue and I
wonder what they did that we should think about.

Chuck

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Travis O. <oli...@ie...> - 2006-09-06 23:39:41

Charles R Harris wrote:
>
> Where is array at this point?
Basically it supports the old Numeric behavior wherein object array's 
are treated as before *except* for when an error would have occurred 
previously when the "new behavior" kicks in.  Anything that violates 
that is a bug needing to be fixed.

This leaves the new object-array constructor used less often.  It could 
be exported explicitly into an oarray constructor, but I'm not sure 
about the advantages of that approach.   There are benefits to having 
object arrays constructed in the same way as other arrays.  It turns out 
many people actually like that feature of Numeric, which is the reason I 
didn't go the route of numarray which pulled object arrays out.

At this point, however, object arrays can even be part of records and so 
need to be an integral part of the data-type description.   Pulling that 
out is not going to happen.  A more intelligent object-array 
constructor, however, may be a useful tool.

-Travis

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: A. M. A. <per...@gm...> - 2006-09-07 23:49:10

Maybe I should stay out of this, but it seems like constructing object
arrays is complicated and involves a certain amount of guesswork on
the part of Numeric.

For example, if you do array([a,b,c]).shape(), the answer is normally
(3,) unless a b and c happen to all be lists of the same length, at
which point your array could have a much more complicated shape... but
as the person who wrote "array([a,b,c])" it's tempting to assume that
the result has shape (3,), only to discover subtle bugs much later.

If we were writing an array-creation function from scratch, would
there be any reason to include object-array creation in the same
function as uniform array creation? It seems like a bad idea to me.

If not, the problem is just compatibility with Numeric. Why not simply
write a wrapper function in python that does Numeric-style guesswork,
and put it in the compatibility modules? How much code will actually
break?

A. M. Archibald

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Charles R H. <cha...@gm...> - 2006-09-07 01:18:39

On 9/6/06, Travis Oliphant <oli...@ie...> wrote:
>
> Charles R Harris wrote:
> >
> > Where is array at this point?
> Basically it supports the old Numeric behavior wherein object array's
> are treated as before *except* for when an error would have occurred
> previously when the "new behavior" kicks in.  Anything that violates
> that is a bug needing to be fixed.
>
> This leaves the new object-array constructor used less often.  It could
> be exported explicitly into an oarray constructor, but I'm not sure
> about the advantages of that approach.   There are benefits to having
> object arrays constructed in the same way as other arrays.  It turns out
> many people actually like that feature of Numeric, which is the reason I
> didn't go the route of numarray which pulled object arrays out.
>
> At this point, however, object arrays can even be part of records and so
> need to be an integral part of the data-type description.   Pulling that
> out is not going to happen.  A more intelligent object-array
> constructor, however, may be a useful tool.


OK. I do have a couple of questions. Let me insert the docs for array and
asarray :

    """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0)

    Return an array from object with the specified date-type.

    Inputs:
      object - an array, any object exposing the array interface, any
                object whose __array__ method returns an array, or any
                (nested) sequence.
      dtype  - The desired data-type for the array.  If not given, then
                the type will be determined as the minimum type required
                to hold the objects in the sequence.  This argument can only
                be used to 'upcast' the array.  For downcasting, use the
                .astype(t) method.
      copy   - If true, then force a copy.  Otherwise a copy will only occur
                if __array__ returns a copy, obj is a nested sequence, or
                a copy is needed to satisfy any of the other requirements
      order  - Specify the order of the array.  If order is 'C', then the
                array will be in C-contiguous order (last-index varies the
                fastest).  If order is 'FORTRAN', then the returned array
                will be in Fortran-contiguous order (first-index varies the
                fastest).  If order is None, then the returned array may
                be in either C-, or Fortran-contiguous order or even
                discontiguous.
      subok  - If True, then sub-classes will be passed-through, otherwise
                the returned array will be forced to be a base-class array
      ndmin  - Specifies the minimum number of dimensions that the resulting
                array should have.  1's will be pre-pended to the shape as
                needed to meet this requirement.

    """)

asarray(a, dtype=None, order=None)
    Returns a as an array.

    Unlike array(), no copy is performed if a is already an array.
Subclasses
    are converted to base class ndarray.

1) Is it true that array doesn't always return a copy except by default?
asarray says it contrasts with array in this regard. Maybe copy=0 should be
deprecated.

2) Is asarray is basically array with copy=0?

3) Is asanyarray basically array with copy=0 and subok=1?

4) Is there some sort of precedence table for conversions? To me it looks
like the most deeply nested lists are converted to arrays first, numeric if
they contain all numeric types, object otherwise. I assume the algorithm
then ascends up through the hierarchy like traversing a binary tree in
postorder?

5) All nesting must be to the same depth and the deepest nested items must
have the same length.

6) How is the difference between lists and "lists" determined, i.e.,

In [3]: array([list([1,2,3]),list([1,2])], dtype = object)
Out[3]: array([[1, 2, 3], [1, 2]], dtype=object)

In [8]: array([array([1,2,3]),array([1,2])], dtype = object)
Out[8]: array([[1 2 3], [1 2]], dtype=object)


In [9]: array([1,2,3],[1,2]], dtype = object)
------------------------------------------------------------
   File "<ipython console>", line 1
     array([1,2,3],[1,2]], dtype = object)
                        ^
SyntaxError: invalid syntax

Is the difference that list(...) and array(...) are passed as functions
(lazy evaluation), but a list is just a list?

Sorry to be asking all these questions, but I would like to try making the
documentation be a bit of a reference. I am sure I will have more questions
;)

-Travis


Chuck

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Travis O. <oli...@ie...> - 2006-09-07 07:03:07

Charles R Harris wrote:
> OK. I do have a couple of questions. Let me insert the docs for array 
> and asarray :
>
>     """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0)
>
>     Return an array from object with the specified date-type.
>
> 1) Is it true that array doesn't always return a copy except by 
> default? asarray says it contrasts with array in this regard. Maybe 
> copy=0 should be deprecated.
array is the main creation function.   It is a loose wrapper around 
PyArray_fromAny.   copy=0 means don't copy unless you have to.

> 2) Is asarray is basically array with copy=0?
Yes.
>
> 3) Is asanyarray basically array with copy=0 and subok=1?
Yes.
>
> 4) Is there some sort of precedence table for conversions? To me it 
> looks like the most deeply nested lists are converted to arrays first, 
> numeric if they contain all numeric types, object otherwise. I assume 
> the algorithm then ascends up through the hierarchy like traversing a 
> binary tree in postorder?
I'm not sure I understand what you mean.  The discover-depth and 
discover-dimensions algorithm figures out what the shape should be and 
then recursive PySequence_GetItem and PySequence_SetItem is used to copy 
the information over to the ndarray from the nested sequence.

>
> 5) All nesting must be to the same depth and the deepest nested items 
> must have the same length.
Yes, there are routines discover_depth and discover_dimensions that are 
the actual algorithm used.  These are adapted from Numeric.
>
> 6) How is the difference between lists and "lists" determined, i.e.,
>
> In [3]: array([list([1,2,3]),list([1,2])], dtype = object)
> Out[3]: array([[1, 2, 3], [1, 2]], dtype=object)
>
> In [8]: array([array([1,2,3]),array([1,2])], dtype = object)
> Out[8]: array([[1 2 3], [1 2]], dtype=object)
>
>
> In [9]: array([1,2,3],[1,2]], dtype = object)
> ------------------------------------------------------------
>    File "<ipython console>", line 1
>      array([1,2,3],[1,2]], dtype = object)
>                         ^
> SyntaxError: invalid syntax

I think this is just due to a missing [ in In [9].   There is no 
semantic difference between
list([1,2,3]) and [1,2,3] (NumPy will see those things as exactly the 
same).
>
> Is the difference that list(...) and array(...) are passed as 
> functions (lazy evaluation), but a list is just a list?
There is nothing like "lazy evaluation" going on.  array([1,2,3]) is 
evaluated returning an object and array([1,2]) is evaluated returning an 
object and then the two are put into another object array.  Equivalent code

a = array([1,2,3])
b = array([1,2])
c = array([a,b],dtype=object)


Thanks for all your help with documentation.  It is very-much appreciated.

-Travis

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Charles R H. <cha...@gm...> - 2006-09-07 01:52:45

On 9/6/06, Charles R Harris <cha...@gm...> wrote:
>
>
>
> On 9/6/06, Travis Oliphant <oli...@ie...> wrote:
> >
> > Charles R Harris wrote:
> > >
> > > Where is array at this point?
> > Basically it supports the old Numeric behavior wherein object array's
> > are treated as before *except* for when an error would have occurred
> > previously when the "new behavior" kicks in.  Anything that violates
> > that is a bug needing to be fixed.
> >
> > This leaves the new object-array constructor used less often.  It could
> > be exported explicitly into an oarray constructor, but I'm not sure
> > about the advantages of that approach.   There are benefits to having
> > object arrays constructed in the same way as other arrays.  It turns out
> > many people actually like that feature of Numeric, which is the reason I
> > didn't go the route of numarray which pulled object arrays out.
> >
> > At this point, however, object arrays can even be part of records and so
> > need to be an integral part of the data-type description.   Pulling that
> > out is not going to happen.  A more intelligent object-array
> > constructor, however, may be a useful tool.
>
>
> OK. I do have a couple of questions. Let me insert the docs for array and
> asarray :
>
>     """array(object, dtype=None, copy=1,order=None, subok=0,ndmin=0)
>
>     Return an array from object with the specified date-type.
>
>     Inputs:
>       object - an array, any object exposing the array interface, any
>                 object whose __array__ method returns an array, or any
>                 (nested) sequence.
>       dtype  - The desired data-type for the array.  If not given, then
>                 the type will be determined as the minimum type required
>                 to hold the objects in the sequence.  This argument can
> only
>                 be used to 'upcast' the array.  For downcasting, use the
>                 .astype(t) method.
>       copy   - If true, then force a copy.  Otherwise a copy will only
> occur
>                 if __array__ returns a copy, obj is a nested sequence, or
>                 a copy is needed to satisfy any of the other requirements
>       order  - Specify the order of the array.  If order is 'C', then the
>                 array will be in C-contiguous order (last-index varies the
>                 fastest).  If order is 'FORTRAN', then the returned array
>                 will be in Fortran-contiguous order (first-index varies
> the
>                 fastest).  If order is None, then the returned array may
>                 be in either C-, or Fortran-contiguous order or even
>                 discontiguous.
>       subok  - If True, then sub-classes will be passed-through, otherwise
>                 the returned array will be forced to be a base-class array
>       ndmin  - Specifies the minimum number of dimensions that the
> resulting
>                 array should have.  1's will be pre-pended to the shape as
>                 needed to meet this requirement.
>
>     """)
>
> asarray(a, dtype=None, order=None)
>     Returns a as an array.
>
>     Unlike array(), no copy is performed if a is already an array.
> Subclasses
>     are converted to base class ndarray.
>
> 1) Is it true that array doesn't always return a copy except by default?
> asarray says it contrasts with array in this regard. Maybe copy=0 should be
> deprecated.
>
> 2) Is asarray is basically array with copy=0?
>
> 3) Is asanyarray basically array with copy=0 and subok=1?
>
> 4) Is there some sort of precedence table for conversions? To me it looks
> like the most deeply nested lists are converted to arrays first, numeric if
> they contain all numeric types, object otherwise. I assume the algorithm
> then ascends up through the hierarchy like traversing a binary tree in
> postorder?
>
> 5) All nesting must be to the same depth and the deepest nested items must
> have the same length.
>
> 6) How is the difference between lists and "lists" determined, i.e.,
>
> In [3]: array([list([1,2,3]),list([1,2])], dtype = object)
> Out[3]: array([[1, 2, 3], [1, 2]], dtype=object)
>
> In [8]: array([array([1,2,3]),array([1,2])], dtype = object)
> Out[8]: array([[1 2 3], [1 2]], dtype=object)
>
>
> In [9]: array([1,2,3],[1,2]], dtype = object)
> ------------------------------------------------------------
>    File "<ipython console>", line 1
>      array([1,2,3],[1,2]], dtype = object)
>                         ^
> SyntaxError: invalid syntax
>
> Is the difference that list(...) and array(...) are passed as functions
> (lazy evaluation), but a list is just a list?
>
> Sorry to be asking all these questions, but I would like to try making the
> documentation be a bit of a reference. I am sure I will have more questions
> ;)
>
> -Travis
>
>
And, voila, ragged arrays:

In [9]: a = array([array([1,2,3]),array([1,2])], dtype = object)

In [10]: a*2
Out[10]: array([[2 4 6], [2 4]], dtype=object)

In [11]: a + a
Out[11]: array([[2 4 6], [2 4]], dtype=object)

Chuck

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: A. M. A. <per...@gm...> - 2006-09-07 02:18:40

On 06/09/06, Charles R Harris <cha...@gm...> wrote:
> On 9/6/06, Charles R Harris <cha...@gm...> wrote:
>
> >       order  - Specify the order of the array.  If order is 'C', then the
> >                 array will be in C-contiguous order (last-index varies the
> >                 fastest).  If order is 'FORTRAN', then the returned array
> >                 will be in Fortran-contiguous order (first-index varies
> the
> >                 fastest).  If order is None, then the returned array may
> >                 be in either C-, or Fortran-contiguous order or even
> >                 discontiguous.

This one's a bit complicated. If array() is passed a list of lists,
there are two different orders that are relevant - the output order of
the array, and the order used to interpret the input. I suppose that
if L is a lost of lists, array(L)[2,3]==L[2][3], that is, in some
sense the arrays are always logically C-ordered even if the underlying
representation is different. Does it make sense to specify this
somewhere in the docstring? At least it would be good to make it clear
that the order parameter affects only the underlying storage format,
and not the indexing of the array.

A. M. Archibald

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Charles R H. <cha...@gm...> - 2006-09-07 19:29:53

On 9/7/06, Charles R Harris <cha...@gm...> wrote:
>
>
>
> On 9/7/06, Travis Oliphant <oli...@ie...> wrote:
> >
> > Charles R Harris wrote:
> > > On 9/6/06, *Charles R Harris* <cha...@gm...
> > > <mailto:cha...@gm... >> wrote:
> > >
> > >
> > >
> > >     On 9/6/06, *Travis Oliphant* < oli...@ie...
> > >     <mailto: oli...@ie...>> wrote:
> > >
> > >         Charles R Harris wrote:
> > >         >
> > >         > Where is array at this point?
> > >         Basically it supports the old Numeric behavior wherein object
> > >         array's
> > >         are treated as before *except* for when an error would have
> > >         occurred
> > >         previously when the "new behavior" kicks in.  Anything that
> > >         violates
> > >         that is a bug needing to be fixed.
> > >
> > >         This leaves the new object-array constructor used less
> > >         often.  It could
> > >         be exported explicitly into an oarray constructor, but I'm not
> >
> > >         sure
> > >         about the advantages of that approach.   There are benefits to
> > >         having
> > >         object arrays constructed in the same way as other arrays.  It
> > >         turns out
> > >         many people actually like that feature of Numeric, which is
> > >         the reason I
> > >         didn't go the route of numarray which pulled object arrays
> > out.
> > >
> > >         At this point, however, object arrays can even be part of
> > >         records and so
> > >         need to be an integral part of the data-type description.
> > >         Pulling that
> > >         out is not going to happen.  A more intelligent object-array
> > >         constructor, however, may be a useful tool.
> > >
> > >
> > >     OK. I do have a couple of questions. Let me insert the docs for
> > >     array and asarray :
> > >
> > >         """array(object, dtype=None, copy=1,order=None,
> > subok=0,ndmin=0)
> > >
> > >         Return an array from object with the specified date-type.
> > >
> > >         Inputs:
> > >           object - an array, any object exposing the array interface,
> > any
> > >                     object whose __array__ method returns an array, or
> > any
> > >                     (nested) sequence.
> > >           dtype  - The desired data-type for the array.  If not given,
> > >     then
> > >                     the type will be determined as the minimum type
> > >     required
> > >                     to hold the objects in the sequence.  This
> > >     argument can only
> > >                     be used to 'upcast' the array.  For downcasting,
> > >     use the
> > >                     .astype(t) method.
> > >           copy   - If true, then force a copy.  Otherwise a copy will
> > >     only occur
> > >                     if __array__ returns a copy, obj is a nested
> > >     sequence, or
> > >                     a copy is needed to satisfy any of the other
> > >     requirements
> > >           order  - Specify the order of the array.  If order is 'C',
> > >     then the
> > >                     array will be in C-contiguous order (last-index
> > >     varies the
> > >                     fastest).  If order is 'FORTRAN', then the
> > >     returned array
> > >                     will be in Fortran-contiguous order (first-index
> > >     varies the
> > >                     fastest).  If order is None, then the returned
> > >     array may
> > >                     be in either C-, or Fortran-contiguous order or
> > even
> > >                     discontiguous.
> > >           subok  - If True, then sub-classes will be passed-through,
> > >     otherwise
> > >                     the returned array will be forced to be a
> > >     base-class array
> > >           ndmin  - Specifies the minimum number of dimensions that the
> > >     resulting
> > >                     array should have.  1's will be pre-pended to the
> > >     shape as
> > >                     needed to meet this requirement.
> > >
> > >         """)
> > >
> > >     asarray(a, dtype=None, order=None)
> > >         Returns a as an array.
> > >
> > >         Unlike array(), no copy is performed if a is already an array.
> > >     Subclasses
> > >         are converted to base class ndarray.
> > >
> > >     1) Is it true that array doesn't always return a copy except by
> > >     default? asarray says it contrasts with array in this regard.
> > >     Maybe copy=0 should be deprecated.
> > >
> > >     2) Is asarray is basically array with copy=0?
> > >
> > >     3) Is asanyarray basically array with copy=0 and subok=1?
> > >
> > >     4) Is there some sort of precedence table for conversions? To me
> > >     it looks like the most deeply nested lists are converted to arrays
> > >     first, numeric if they contain all numeric types, object
> > >     otherwise. I assume the algorithm then ascends up through the
> > >     hierarchy like traversing a binary tree in postorder?
> > >
> > >     5) All nesting must be to the same depth and the deepest nested
> > >     items must have the same length.
> > >
> > >     6) How is the difference between lists and "lists" determined, i.e
> > .,
> > >
> > >     In [3]: array([list([1,2,3]),list([1,2])], dtype = object)
> > >     Out[3]: array([[1, 2, 3], [1, 2]], dtype=object)
> > >
> > >     In [8]: array([array([1,2,3]),array([1,2])], dtype = object)
> > >     Out[8]: array([[1 2 3], [1 2]], dtype=object)
> > >
> > >
> > >     In [9]: array([1,2,3],[1,2]], dtype = object)
> > >     ------------------------------------------------------------
> > >        File "<ipython console>", line 1
> > >          array([1,2,3],[1,2]], dtype = object)
> > >                             ^
> > >     SyntaxError: invalid syntax
> > >
> > >     Is the difference that list(...) and array(...) are passed as
> > >     functions (lazy evaluation), but a list is just a list?
> > >
> > >     Sorry to be asking all these questions, but I would like to try
> > >     making the documentation be a bit of a reference. I am sure I will
> > >     have more questions ;)
> > >
> > >         -Travis
> > >
> > >
> > > And, voila, ragged arrays:
> > >
> > > In [9]: a = array([array([1,2,3]),array([1,2])], dtype = object)
> > >
> > > In [10]: a*2
> > > Out[10]: array([[2 4 6], [2 4]], dtype=object)
> > >
> > > In [11]: a + a
> > > Out[11]: array([[2 4 6], [2 4]], dtype=object)
> >
> > Now I remember that this was my original motivation for futzing with the
> >
> > object-array constructor in the first place.  So, now you get there only
> > after an attempt to make a "rectangular" array first.
> >
> > -Travis
>
>
> So is this intentional?
>
> In [24]: a = array([[],[],[]], dtype=object)
>
> In [25]: a.shape
> Out[25]: (3, 0)
>
> In [26]: a = array([], dtype=object)
>
> In [27]: a.shape
> Out[27]: (0,)
>
> One could argue that the first array should have shape (3,)
>

And this doesn't look quite right:

In [38]: a = array([[1],[2],[3]], dtype=object)

In [39]: a.shape
Out[39]: (3, 1)

In [40]: a = array([[1],[2,3],[4,5]], dtype=object)

In [41]: a.shape
Out[41]: (3,)

Chuck

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Charles R H. <cha...@gm...> - 2006-09-07 22:48:54

On 9/7/06, Travis Oliphant <oli...@ee...> wrote:
>
> Charles R Harris wrote:
>
> >
> >     So is this intentional?
> >
> >     In [24]: a = array([[],[],[]], dtype=object)
> >
> >     In [25]: a.shape
> >     Out[25]: (3, 0)
> >
> >     In [26]: a = array([], dtype=object)
> >
> >     In [27]: a.shape
> >     Out[27]: (0,)
> >
> >     One could argue that the first array should have shape (3,)
> >
> Yes, it's intentional because it's the old behavior of Numeric.  And it
> follows the rule that object arrays don't do anything special unless the
> old technique of using [] as 'dimension delimiters' breaks down.
>
> >
> > And this doesn't look quite right:
> >
> > In [38]: a = array([[1],[2],[3]], dtype=object)
> >
> > In [39]: a.shape
> > Out[39]: (3, 1)
> >
> > In [40]: a = array([[1],[2,3],[4,5]], dtype=object)
> >
> > In [41]: a.shape
> > Out[41]: (3,)
> >
>
> Again, same reason as before.  The first example works fine to construct
> a rectangular array of object arrays of dimension 2.  The second only
> does if we limit the number of dimensions to 1.
>
> The rule is that array needs nested lists with the same number of
> dimensions unless you have object arrays.  Then, the dimensionality will
> be determined by finding the largest number of dimensions possible for
> consistency of shape.

So there is a 'None' trick:

In [93]: a = array([[[2]], None], dtype=object)

In [94]: a[0]
Out[94]: [[2]]

I wonder if it wouldn't be useful to have a 'depth' keyword. Thus depth=None
is current behavior, but

array([], depth=0)

would produce a zero dimensional array containing an empty list. Although I
notice from playing with dictionaries that a zero dimensional array
containing a dictionary isn't very useful.

array([[],[]], depth=1)

would produce a one dimensional array containing two empty lists, etc. I can
see it is difficult to get something truely general with the current syntax
without a little bit of extra information.

Another question, what property must an object possess to be a container
type argument in array? There are sequence type objects, and array type
objects. Are there more or is everything else treated as an object?

Chuck

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Matthew B. <mat...@gm...> - 2006-09-14 00:13:54

Hi,

> For example, if you do array([a,b,c]).shape(), the answer is normally
> (3,) unless a b and c happen to all be lists of the same length, at
> which point your array could have a much more complicated shape... but
> as the person who wrote "array([a,b,c])" it's tempting to assume that
> the result has shape (3,), only to discover subtle bugs much later.

Very much agree with this.

> If we were writing an array-creation function from scratch, would
> there be any reason to include object-array creation in the same
> function as uniform array creation? It seems like a bad idea to me.
>
> If not, the problem is just compatibility with Numeric. Why not simply
> write a wrapper function in python that does Numeric-style guesswork,
> and put it in the compatibility modules? How much code will actually
> break?

Can I encourage any more comments?  This suggestion seems very
sensible to me, and I guess this is our very last chance to change
this. The current behavior does seem to violate least surprise - at
least to my eye.

Best,

Matthew

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Charles R H. <cha...@gm...> - 2006-09-14 01:07:42

On 9/13/06, Matthew Brett <mat...@gm...> wrote:
>
> Hi,
>
> > For example, if you do array([a,b,c]).shape(), the answer is normally
> > (3,) unless a b and c happen to all be lists of the same length, at
> > which point your array could have a much more complicated shape... but
> > as the person who wrote "array([a,b,c])" it's tempting to assume that
> > the result has shape (3,), only to discover subtle bugs much later.
>
> Very much agree with this.
>
> > If we were writing an array-creation function from scratch, would
> > there be any reason to include object-array creation in the same
> > function as uniform array creation? It seems like a bad idea to me.
> >
> > If not, the problem is just compatibility with Numeric. Why not simply
> > write a wrapper function in python that does Numeric-style guesswork,
> > and put it in the compatibility modules? How much code will actually
> > break?
>
> Can I encourage any more comments?  This suggestion seems very
> sensible to me, and I guess this is our very last chance to change
> this. The current behavior does seem to violate least surprise - at
> least to my eye.

I've been thinking about how to write a new constructor for objects. Because
array has been at the base of numpy for many years I think it is too late to
change it now, but perhaps a new and more predictable constructor for
objects may eventually displace it. The main problem in constructing arrays
of objects is more information needs to be supplied because the user's
intention can't be reliably deduced from the current syntax. That said, I
have no idea how widespread the use of object arrays is and so don't know
how much it really matters. I don't use them much myself.

Chuck

Re: [Numpy-discussion] Problem with concatenate and object arrays

From: Christopher B. <Chr...@no...> - 2006-09-14 16:29:15

Charles R Harris wrote:
>> > Why not simply
>> > write a wrapper function in python that does Numeric-style guesswork,
>> > and put it in the compatibility modules? 

>> Can I encourage any more comments? 

+1

> The main problem in constructing arrays
> of objects is more information needs to be supplied because the user's
> intention can't be reliably deduced from the current syntax.

I wrote about this a bit early in this conversation, and as I thought 
about it. I'm not sure it's possible _- you could specify a rank, or a 
shape, but in general, there wouldn't be a unique way to translate an 
given hierarchy of sequences into a particular shape: imagine four 
levels of nested lists, asked to turn into a rank-3 array.

This is why it may be best to simply recommend that people create an 
empty array of the shape they need, then put the objects into it - it's 
the only way to construct what you need reliably.

However, an object array constructor that take a rank as an argument 
might well work for most cases, as long as there is a clearly documented 
and consistent way to handle extra levels of sequences: perhaps specify 
that any extra levels of nesting always go to the last dimension (or the 
first). That being said, it's still dangerous -- what levels of nesting 
are allowed would depend on which sequences *happen* to be the same 
size. Also the code would be a pain to write!

I wonder how often people need to use objects arrays when they don't 
know when writing the code what shape they need?

this is making me think that maybe all we really need is a little 
syntactic sugar for creating empty object arrays:

numpy.ObjectArray(shape)

Not much different than:

numpy.empty(shape, dtype=numpy.object)

but a little cleaner an more obvious to new users that are primarily 
interested in object arrays -- analogous to ones() and zeros()

 > That said, I
> have no idea how widespread the use of object arrays is and so don't know
> how much it really matters. 

If we ever get nd-arrays into the standard lib (or want to see wider use 
of them in any case), I think that object arrays are critical. Right 
now, people think they don't have a use for numpy if they aren't doing 
serious number crunching -- it's seen mostly as a way to speed up 
computations on lots of numbers. However, I think nd-arrays have LOTS of 
  other applications, for anything where the data fits well in to a 
"rectangular" data structure. n-d slicing is a wonderful thing! As numpy 
gets wider use -- object arrays will be a very big draw.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...