Thread: RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Perry G. <pe...@st...> - 2002-06-06 20:29:15

[I thought I replied yesterday, but somehow that apparently vanished.]

<Konrad Hinsen writes>: 
> "Perry Greenfield" <pe...@st...> writes:
> 
> > Numarray has different coercion rules so that this doesn't
> > happen. Thus one doesn't need c[1,1] to give a rank-0 array.
> 
> What are those coercion rules?
> 
For binary operations between a Python scalar and array, there is
no coercion performed on the array type if the scalar is of the
same kind as the array (but not same size or precision). For example
(assuming ints happen to be 32 bit in this case)

Python Int (Int32) * Int16 array --> Int16 array
Python Float (Float64) * Float32 array --> Float32 array. 

But if the Python scalar is of a higher kind, e.g., Python float
scalar with Int array, then the array is coerced to the corresponding
type of the Python scalar.

Python Float (Float64) * Int16 array --> Float64 array.
Python Complex (Complex64) * Float32 array --> Complex64 array. 

Numarray basically has the same coercion rules as Numeric when two
arrays are involved (there are some extra twists such as:

UInt16 array * Int16 array --> Int32 array

since neither input type is a proper subset of the other. (But since
Numeric doesn't (or didn't until Travis changed that) have unsigned
types, that wouldn't have been an issue with Numeric.)


> > (if that isn't too hard to implement). Of course you get into
> > backward compatibility issues. But really, to get it right, some
> > incompatibility is necessary if you want to eliminate this particular
> > wart.
> 
> For a big change such as Numarray, I'd accept some incompatibilities.
> For just a new version of NumPy, no. There is a lot of code out there
> that uses NumPy, and I am sure that a good part of it relies on the
> current coercion rules. Moreover, there is no simple way to detect
> code that depends on coercion rules, so adapting existing code would
> be an enormous amount of work.
> 
Certainly. I didn't mean to minimize that. But the current coercion
rules have produced a demand for solutions to the problem of upcasting,
and I consider those solutions to be less than ideal (savespace and
rank-0 arrays). If people really are troubled by these warts, I'm
arguing that the real solution is in changing the coercion behavior.
(Yes, it would be easiest to deal with if Python had all these types,
but I think that will never happen, nor should it happen.)

Perry

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Perry G. <pe...@st...> - 2002-06-07 16:42:15

> > For binary operations between a Python scalar and array, there is
> > no coercion performed on the array type if the scalar is of the
> > same kind as the array (but not same size or precision). For example
> > (assuming ints happen to be 32 bit in this case)
> 
> That solves one problem and creates another... Two, in fact. One is
> the inconsistency problem: Python type coercion always promotes
> "smaller" to "bigger" types, it would be good to make no exceptions
> from this rule.
> 
> Besides, there are still situations in which types, ranks, and
> indexing operations depend on each other in a strange way. With
> 
>   a = array([1., 2.], Float)
>   b = array([3., 4.], Float32)
> 
> the result of
> 
>   a*b
> 
> is of type Float, whereas
> 
>   a[0]*b
> 
> is of type Float32 - if and only if a has rank 1.
>
All this is true. It really comes down to which poison you 
prefer. Neither choice is perfect. Changing the coercion rules
results in the inconsistencies you mention. Not changing them
results in the existing inconsistencies recently discussed 
(and still doesn't remove the difficulties of dealing with
scalars in expressions without awkward constructs). We think
the inconsistencies you point out are easier to live with than
the existing behavior.

It would be nice to have a solution that had none of these 
problems, but that doesn't appear to be possible.

Perry

Re: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Konrad H. <hi...@cn...> - 2002-06-07 20:48:50

> It would be nice to have a solution that had none of these 
> problems, but that doesn't appear to be possible.

I still believe that the best solution is to define scalar data types
corresponding to all array element types. As far as I can see, this
doesn't have any of the disadvantages of the other solutions that
have been proposed until now.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hi...@cn...
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Perry G. <pe...@st...> - 2002-06-07 21:41:34

<Konrad Hinsen writes>:

> I still believe that the best solution is to define scalar data types
> corresponding to all array element types. As far as I can see, this
> doesn't have any of the disadvantages of the other solutions that
> have been proposed until now.
> 
If x was a Float32 array how would the following not be
promoted to a Float64 array

y = x + 1.

If you are proposing something like

y = x + Float32(1.)

it would work, but it sure leads to some awkward expressions.

Perry

Re: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Konrad H. <hi...@cn...> - 2002-06-08 07:59:54

> If you are proposing something like
> 
> y = x + Float32(1.)
> 
> it would work, but it sure leads to some awkward expressions.

Yes, that's what I am proposing. It's no worse than what we have now,
and if writing Float32 a hundred times is too much effort, an
abbreviation like f = Float32 helps a lot.

Anyway, following the Python credo "explicit is better than implicit",
I'd rather write explicit type conversions than have automagical ones
surprise me.

Finally, we can always lobby for inclusion of the new scalar types
into the core interpreter, with a corresponding syntax for literals,
but it would sure help if we could show that the system works and
suffers only from the lack of literals.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hi...@cn...
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

Re: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Travis O. <oli...@ie...> - 2002-06-09 01:55:10

I did not receive any major objections, and so I have released a new
Numeric (21.3) incorporating bug fixes.  I also tagged the CVS tree with
VERSION_21_3, and then I incorporated the unsigned integers and unsigned
shorts into the CVS version of Numeric, for inclusion in a tentatively
named version 22.0

I've only uploaded a platform independent tar file for 21.3.  Any
binaries need to be updated. 

If you are interested in testing the new additions, please let me know
of any bugs you find.

Thanks,

-Travis O.

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: eric j. <er...@en...> - 2002-06-10 00:18:30

> > If you are proposing something like
> >
> > y = x + Float32(1.)
> >
> > it would work, but it sure leads to some awkward expressions.
> 
> Yes, that's what I am proposing. It's no worse than what we have now,
> and if writing Float32 a hundred times is too much effort, an
> abbreviation like f = Float32 helps a lot.
> 
> Anyway, following the Python credo "explicit is better than implicit",
> I'd rather write explicit type conversions than have automagical ones
> surprise me.

How about making indexing (not slicing) arrays *always* return a 0-D
array with copy instead of "view" semantics?  This is nearly equivalent
to creating a new scalar type, but without requiring major changes.  I
think it is probably even more useful for writing generic code because
the returned value with retain array behavior.  Also, the following
example

 >   a = array([1., 2.], Float)
 >   b = array([3., 4.], Float32)
 > 
 >   a[0]*b

would now return a Float array as Konrad desires because a[0] is a Float
array.  Using copy semantics would fix the unexpected behavior reported
by Larry that kicked off this discussion.  Slices are a different animal
than indexing that would (and definitely should) continue to return view
semantics.

I further believe that all Numeric functions (sum, product, etc.) should
return arrays all the time instead of converting implicitly converting
them to Python scalars in special cases such as reductions of 1d arrays.
I think the only reason for the silent conversion is that Python lists
only allow integer values for use in indexing so that:

 >>> a = [1,2,3,4]
 >>> a[array(0)]
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 TypeError: sequence index must be integer

Numeric arrays don't have this problem:

 >>> a = array([1,2,3,4])
 >>> a[array(0)]
 1

I don't think this alone is a strong enough reason for the conversion.
Getting rid of special cases is more important because it makes behavior
predictable to the novice (and expert), and it is easier to write
generic functions and be sure they will not break a year from now when
one of the special cases occurs.  

Are there other reasons why scalars are returned?

On coercion rules:

As for adding the array to a scalar value, 

  x = array([3., 4.], Float32)
  y = x + 1.

Should y be a Float or a Float32?  I like numarray's coercion rules
better (Float32).  I have run into this upcasting to many times to
count.  Explicit and implicit aren't obvious to me here.  The user
explicitly cast x to be Float32, but because of the limited numeric
types in Python, the result is upcast to a double.  Here's another
example,

  >>> from Numeric import *
  >>> a = array((1,2,3,4), UnsignedInt8)
  >>> left_shift(a,3)
  array([ 8, 16, 24, 32],'i')

I had to stare at this for a while when I first saw it before I realized
the integer value 3 upcast the result to be type 'i'.  So, I think this
is confusing and rarely the desired behavior.  The fact that this is
inconsistent with Python's "always upcast" rule is minor for me.  The
array math operations are necessarily a different animal from scalar
operations because of the extra types supported.  Defining these
operations in a way that is most convenient for working with array data
seems OK.

On the other hand, I don't think a jump from 21 to 22 is enough of a
jump to make such a change.  Numeric progresses pretty fast, and users
don't expect such a major shift in behavior.  I do think, though, that
the computational speed issue is going to result in numarray and Numeric
existing side-by-side for a long time.  Perhaps we should think create
an "interim" Numeric version (maybe starting at 30), that tries to be
compatible with the upcoming numarray, in its coercion rules, etc?
Advanced features such as indexing arrays with arrays, memory mapped
arrays, floating point exception behavior, etc. won't be there, but it
should help people transition their codes to work with numarray, and
also offer a speedy alternative.

A second choice would be to make SciPy's Numeric implementation the
intermediate step.  It already produces NaN's during div-by-zero
exceptions according to numarray's rules.  The coercion modifications
could also be incorporated.  

> 
> Finally, we can always lobby for inclusion of the new scalar types
> into the core interpreter, with a corresponding syntax for literals,
> but it would sure help if we could show that the system works and
> suffers only from the lack of literals.

There was a seriously considered debate last year about unifying
Python's numeric model into a single type to get rid of the
integer-float distinction, at last year's Python conference and the
ensuing months.  While it didn't (and won't) happen, I'd be real
surprised if the general community would welcome us suggesting stirring
yet another type into the brew.  Can't we make 0-d arrays work as an
alternative?

eric
 
> 
> Konrad.
> --
>
------------------------------------------------------------------------
--
> -----
> Konrad Hinsen                            | E-Mail:
hi...@cn...
> Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
> Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
> 45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
> France                                   | Nederlands/Francais
>
------------------------------------------------------------------------
--
> -----
> 
> _______________________________________________________________
> 
> Don't miss the 2002 Sprint PCS Application Developer's Conference
> August 25-28 in Las Vegas -
> http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink
> 
> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Re: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Konrad H. <hi...@cn...> - 2002-06-10 17:12:40

"eric jones" <er...@en...> writes:

> How about making indexing (not slicing) arrays *always* return a 0-D
> array with copy instead of "view" semantics?  This is nearly equivalent
> to creating a new scalar type, but without requiring major changes.  I
...

I think this was discussed as well a long time ago. For pure Python
code, this would be a very good solution. But 

> I think the only reason for the silent conversion is that Python lists
> only allow integer values for use in indexing so that:

There are some more cases where the type matters. If you call C
routines that do argument parsing via PyArg_ParseTuple and expect a
float argument, a rank-0 float array will raise a TypeError. All the
functions from the math module work like that, and of course many in
various extension modules.

In the ideal world, there would not be any distinction between scalars
and rank-0 arrays. But I don't think we'll get there soon.

> On coercion rules:
> 
> As for adding the array to a scalar value, 
> 
>   x = array([3., 4.], Float32)
>   y = x + 1.
> 
> Should y be a Float or a Float32?  I like numarray's coercion rules
> better (Float32).  I have run into this upcasting to many times to

Statistically they probably give the desired result in more cases. But
they are in contradiction to Python principles, and consistency counts
a lot on my value scale.

I propose an experiment: ask a few Python programmers who are not
using NumPy what type they would expect for the result. I bet that not
a single one would answer "Float32".

> On the other hand, I don't think a jump from 21 to 22 is enough of a
> jump to make such a change.  Numeric progresses pretty fast, and users

I don't think any increase in version number is enough for
incompatible changes. For many users, NumPy is just a building block,
they install it because some other package(s) require it. If a new
version breaks those other packages, they won't be happy. The authors
of those packages won't be happy either, as they will get the angry
letters.

As an author of such packages, I am speaking from experience. I have
even considered to make my own NumPy distribution under a different
name, just to be safe from changes in NumPy that break my code (in the
past it was mostly the installation code that was broken when
arrayobject.h changed its location).

In my opinion, anything that is not compatible with Numeric should not
be called Numeric.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hi...@cn...
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Paul F D. <pa...@pf...> - 2002-06-10 18:19:37

We have certainly beaten this topic to death in the past. It keeps
coming up because there is no good way around it. 

Two points about the x + 1.0 issue: 

1. How often this occurs is really a function of what you are doing. For
those using Numeric Python as a kind of MATLAB clone, who are typing
interactively, the size issue is of less importance and the easy
expression is of more importance. To those writing scripts to batch
process or writing steered applications, the size issue is more
important and the easy expression less important. I'm using words like
less and more here because both issues matter to everyone at some time,
it is just a question of relative frequency of concern.

2. Part of what I had in mind with the kinds module proposal PEP 0242
was dealing with the literal issue. There had been some proposals to
make literals decimal numbers or rationals, and that got me thinking
about how to defend myself if they did it, and also about the fact that
Python doesn't have Fortran's kind concept which you can use to gain a
more platform-independent calculation.

From the PEP this example

In module myprecision.py:

        import kinds
        tinyint = kinds.int_kind(1)
        single = kinds.float_kind(6, 90)
        double = kinds.float_kind(15, 300)
        csingle = kinds.complex_kind(6, 90)
     
    In the rest of my code:

        from myprecision import tinyint, single, double, csingle  
        n = tinyint(3)
        x = double(1.e20)
        z = 1.2
        # builtin float gets you the default float kind, properties
unknown
        w = x * float(x)
        # but in the following case we know w has kind "double".
        w = x * double(z)

        u = csingle(x + z * 1.0j)
        u2 = csingle(x+z, 1.0)

    Note how that entire code can then be changed to a higher
    precision by changing the arguments in myprecision.py.

    Comment: note that you aren't promised that single != double; but
    you are promised that double(1.e20) will hold a number with 15
    decimal digits of precision and a range up to 10**300 or that the
    float_kind call will fail.




> -----Original Message-----
> From: num...@li... 
> [mailto:num...@li...] On 
> Behalf Of Konrad Hinsen
> Sent: Monday, June 10, 2002 10:08 AM
> To: eric jones
> Cc: num...@li...
> Subject: Re: FW: [Numpy-discussion] Bug: extremely misleading 
> array behavior
> 
> 
> "eric jones" <er...@en...> writes:
> 
> > How about making indexing (not slicing) arrays *always* 
> return a 0-D 
> > array with copy instead of "view" semantics?  This is nearly 
> > equivalent to creating a new scalar type, but without 
> requiring major 
> > changes.  I
> ...
> 
> I think this was discussed as well a long time ago. For pure 
> Python code, this would be a very good solution. But 
> 
> > I think the only reason for the silent conversion is that 
> Python lists 
> > only allow integer values for use in indexing so that:
> 
> There are some more cases where the type matters. If you call 
> C routines that do argument parsing via PyArg_ParseTuple and 
> expect a float argument, a rank-0 float array will raise a 
> TypeError. All the functions from the math module work like 
> that, and of course many in various extension modules.
> 
> In the ideal world, there would not be any distinction 
> between scalars and rank-0 arrays. But I don't think we'll 
> get there soon.
> 
> > On coercion rules:
> > 
> > As for adding the array to a scalar value,
> > 
> >   x = array([3., 4.], Float32)
> >   y = x + 1.
> > 
> > Should y be a Float or a Float32?  I like numarray's coercion rules 
> > better (Float32).  I have run into this upcasting to many times to
> 
> Statistically they probably give the desired result in more 
> cases. But they are in contradiction to Python principles, 
> and consistency counts a lot on my value scale.
> 
> I propose an experiment: ask a few Python programmers who are 
> not using NumPy what type they would expect for the result. I 
> bet that not a single one would answer "Float32".
> 
> > On the other hand, I don't think a jump from 21 to 22 is 
> enough of a 
> > jump to make such a change.  Numeric progresses pretty 
> fast, and users
> 
> I don't think any increase in version number is enough for 
> incompatible changes. For many users, NumPy is just a 
> building block, they install it because some other package(s) 
> require it. If a new version breaks those other packages, 
> they won't be happy. The authors of those packages won't be 
> happy either, as they will get the angry letters.
> 
> As an author of such packages, I am speaking from experience. 
> I have even considered to make my own NumPy distribution 
> under a different name, just to be safe from changes in NumPy 
> that break my code (in the past it was mostly the 
> installation code that was broken when arrayobject.h changed 
> its location).
> 
> In my opinion, anything that is not compatible with Numeric 
> should not be called Numeric.
> 
> Konrad.
> -- 
> --------------------------------------------------------------
> -----------------
> Konrad Hinsen                            | E-Mail: 
> hi...@cn...
> Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
> Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
> 45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
> France                                   | Nederlands/Francais
> --------------------------------------------------------------
> -----------------
> 
> _______________________________________________________________
> 
> Don't miss the 2002 Sprint PCS Application Developer's 
> Conference August 25-28 in Las Vegas - 
> http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink
> 
> 
> _______________________________________________
> Numpy-discussion mailing list Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: eric j. <er...@en...> - 2002-06-10 20:10:37

> We have certainly beaten this topic to death in the past. It keeps
> coming up because there is no good way around it.
> 
> Two points about the x + 1.0 issue:
> 
> 1. How often this occurs is really a function of what you are doing.
For
> those using Numeric Python as a kind of MATLAB clone, who are typing
> interactively, the size issue is of less importance and the easy
> expression is of more importance. To those writing scripts to batch
> process or writing steered applications, the size issue is more
> important and the easy expression less important. I'm using words like
> less and more here because both issues matter to everyone at some
time,
> it is just a question of relative frequency of concern.
> 
> 2. Part of what I had in mind with the kinds module proposal PEP 0242
> was dealing with the literal issue. There had been some proposals to
> make literals decimal numbers or rationals, and that got me thinking
> about how to defend myself if they did it, and also about the fact
that
> Python doesn't have Fortran's kind concept which you can use to gain a
> more platform-independent calculation.
> 
> >From the PEP this example
> 
> In module myprecision.py:
> 
>         import kinds
>         tinyint = kinds.int_kind(1)
>         single = kinds.float_kind(6, 90)
>         double = kinds.float_kind(15, 300)
>         csingle = kinds.complex_kind(6, 90)
> 
>     In the rest of my code:
> 
>         from myprecision import tinyint, single, double, csingle
>         n = tinyint(3)
>         x = double(1.e20)
>         z = 1.2
>         # builtin float gets you the default float kind, properties
> unknown
>         w = x * float(x)
>         # but in the following case we know w has kind "double".
>         w = x * double(z)
> 
>         u = csingle(x + z * 1.0j)
>         u2 = csingle(x+z, 1.0)
> 
>     Note how that entire code can then be changed to a higher
>     precision by changing the arguments in myprecision.py.
> 
>     Comment: note that you aren't promised that single != double; but
>     you are promised that double(1.e20) will hold a number with 15
>     decimal digits of precision and a range up to 10**300 or that the
>     float_kind call will fail.
> 

I think this is a nice feature, but it's actually heading the opposite
direction of where I'd like to see things go for the general use of
Numeric.  Part of Python's appeal for me is that I don't have to specify
types everywhere.  I don't want to write explicit casts throughout
equations because it munges up their readability.  Of course, the
casting sometimes can't be helped, but Numeric's current behavior really
forces this explicit casting for array types besides double, int, and
double complex.  
I like Numarray's fix for this problem.

Also, as Perry noted, its unlikely to be used as an everyday command
line tool (like Matlab) if the verbose casting is required.

I'm interested to learn what other drawbacks yall found with always
returning arrays (0-d for scalars) from Numeric functions.  Konrad
mentioned the tuple parsing issue in some extension libraries that
expects floats, but
it sounds like Travis thinks this is no longer an issue.  Are there
others?

eric

> 
> 
> 
> > -----Original Message-----
> > From: num...@li...
> > [mailto:num...@li...] On
> > Behalf Of Konrad Hinsen
> > Sent: Monday, June 10, 2002 10:08 AM
> > To: eric jones
> > Cc: num...@li...
> > Subject: Re: FW: [Numpy-discussion] Bug: extremely misleading
> > array behavior
> >
> >
> > "eric jones" <er...@en...> writes:
> >
> > > How about making indexing (not slicing) arrays *always*
> > return a 0-D
> > > array with copy instead of "view" semantics?  This is nearly
> > > equivalent to creating a new scalar type, but without
> > requiring major
> > > changes.  I
> > ...
> >
> > I think this was discussed as well a long time ago. For pure
> > Python code, this would be a very good solution. But
> >
> > > I think the only reason for the silent conversion is that
> > Python lists
> > > only allow integer values for use in indexing so that:
> >
> > There are some more cases where the type matters. If you call
> > C routines that do argument parsing via PyArg_ParseTuple and
> > expect a float argument, a rank-0 float array will raise a
> > TypeError. All the functions from the math module work like
> > that, and of course many in various extension modules.
> >
> > In the ideal world, there would not be any distinction
> > between scalars and rank-0 arrays. But I don't think we'll
> > get there soon.
> >
> > > On coercion rules:
> > >
> > > As for adding the array to a scalar value,
> > >
> > >   x = array([3., 4.], Float32)
> > >   y = x + 1.
> > >
> > > Should y be a Float or a Float32?  I like numarray's coercion
rules
> > > better (Float32).  I have run into this upcasting to many times to
> >
> > Statistically they probably give the desired result in more
> > cases. But they are in contradiction to Python principles,
> > and consistency counts a lot on my value scale.
> >
> > I propose an experiment: ask a few Python programmers who are
> > not using NumPy what type they would expect for the result. I
> > bet that not a single one would answer "Float32".
> >
> > > On the other hand, I don't think a jump from 21 to 22 is
> > enough of a
> > > jump to make such a change.  Numeric progresses pretty
> > fast, and users
> >
> > I don't think any increase in version number is enough for
> > incompatible changes. For many users, NumPy is just a
> > building block, they install it because some other package(s)
> > require it. If a new version breaks those other packages,
> > they won't be happy. The authors of those packages won't be
> > happy either, as they will get the angry letters.
> >
> > As an author of such packages, I am speaking from experience.
> > I have even considered to make my own NumPy distribution
> > under a different name, just to be safe from changes in NumPy
> > that break my code (in the past it was mostly the
> > installation code that was broken when arrayobject.h changed
> > its location).
> >
> > In my opinion, anything that is not compatible with Numeric
> > should not be called Numeric.
> >
> > Konrad.
> > --
> > --------------------------------------------------------------
> > -----------------
> > Konrad Hinsen                            | E-Mail:
> > hi...@cn...
> > Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
> > Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
> > 45071 Orleans Cedex 2                    |
Deutsch/Esperanto/English/
> > France                                   | Nederlands/Francais
> > --------------------------------------------------------------
> > -----------------
> >
> > _______________________________________________________________
> >
> > Don't miss the 2002 Sprint PCS Application Developer's
> > Conference August 25-28 in Las Vegas -
> > http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink
> >
> >
> > _______________________________________________
> > Numpy-discussion mailing list Num...@li...
> > https://lists.sourceforge.net/lists/listinfo/numpy-discussion
> >
> 
> 
> _______________________________________________________________
> 
> Don't miss the 2002 Sprint PCS Application Developer's Conference
> August 25-28 in Las Vegas -
> http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink
> 
> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Paul F D. <pa...@pf...> - 2002-06-10 23:05:14

> Konrad mentioned the tuple parsing issue in some 
> extension libraries that expects floats, but it sounds like 
> Travis thinks this is no longer an issue.  Are there others?
> 
> eric
> 
Lots of code tries to distinguish cases using isinstance, and these
tests will fail if given an array instance when they are testing for a
float.

[Numpy-discussion] 0-D arrays as scalars

From: Travis O. <oli...@ie...> - 2002-06-10 18:12:51

On Mon, 2002-06-10 at 11:08, Konrad Hinsen wrote:
> "eric jones" <er...@en...> writes:
> 
> 
> > I think the only reason for the silent conversion is that Python lists
> > only allow integer values for use in indexing so that:
> 
> There are some more cases where the type matters. If you call C
> routines that do argument parsing via PyArg_ParseTuple and expect a
> float argument, a rank-0 float array will raise a TypeError. All the
> functions from the math module work like that, and of course many in
> various extension modules.

Actually, the code in PyArg_ParseTuple asks the object it gets if it
knows how to be a float.  0-d arrays for some time have known how to be
Python floats.  So, I do not think this error occurs as you've
described.  Could you demonstrate this error?

In fact most of the code in Python itself which needs scalars allows
arbitrary objects provided the object has defined functions which return
a Python scalar.  

The only exception to this that I've seen is the list indexing code
(probably for optimization purposes).   There could be more places, but
I have not found them or heard of them.  

Originally Numeric arrays did not define appropriate functions for 0-d
arrays to act like scalars in the right places.  For quite a while, they
have now.  I'm quite supportive of never returning Python scalars from
Numeric array operations unless specifically requested (e.g. the
toscalar method).  

> > On coercion rules:
> > 
> > As for adding the array to a scalar value, 
> > 
> >   x = array([3., 4.], Float32)
> >   y = x + 1.
> > 
> > Should y be a Float or a Float32?  I like numarray's coercion rules
> > better (Float32).  I have run into this upcasting to many times to
> 
> Statistically they probably give the desired result in more cases. But
> they are in contradiction to Python principles, and consistency counts
> a lot on my value scale.
> 
> I propose an experiment: ask a few Python programmers who are not
> using NumPy what type they would expect for the result. I bet that not
> a single one would answer "Float32".
> 

I'm not sure I agree with that at all.  On what reasoning is that
presumption based?  If I encounter a Python object that I'm unfamiliar
with, I don't presume to know how it will define multiplication.

Re: [Numpy-discussion] 0-D arrays as scalars

From: Konrad H. <hi...@cn...> - 2002-06-11 12:56:26

Travis Oliphant <oli...@ie...> writes:

> Actually, the code in PyArg_ParseTuple asks the object it gets if it
> knows how to be a float.  0-d arrays for some time have known how to be
> Python floats.  So, I do not think this error occurs as you've
> described.  Could you demonstrate this error?

No, it seems gone indeed. I remember a lengthy battle due to this
problem, but that was a long time ago.

> The only exception to this that I've seen is the list indexing code
> (probably for optimization purposes).   There could be more places, but
> I have not found them or heard of them.  

Even for indexing, I don't see the point. If you test for the int type
and do conversion attempts only for non-ints, that shouldn't slow down
normal usage at all.

> have now.  I'm quite supportive of never returning Python scalars from
> Numeric array operations unless specifically requested (e.g. the
> toscalar method).  

I suppose this would be easy to implement, right? Then why not do it in
a test release and find out empirically how much code it breaks.

> presumption based?  If I encounter a Python object that I'm unfamiliar
> with, I don't presume to know how it will define multiplication. 

But if that object pretends to be a number type, a sequence type,
a mapping type, etc., I do make assumptions about its behaviour.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hi...@cn...
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Perry G. <pe...@st...> - 2002-06-10 19:06:33

<Paul Dubois writes>: 
> 
> We have certainly beaten this topic to death in the past. It keeps
> coming up because there is no good way around it. 
> 
Ain't that the truth.

> Two points about the x + 1.0 issue: 
> 
> 1. How often this occurs is really a function of what you are doing. For
> those using Numeric Python as a kind of MATLAB clone, who are typing
> interactively, the size issue is of less importance and the easy
> expression is of more importance. To those writing scripts to batch
> process or writing steered applications, the size issue is more
> important and the easy expression less important. I'm using words like
> less and more here because both issues matter to everyone at some time,
> it is just a question of relative frequency of concern.
> 
We have many in the astronomical community that use IDL (instead
of MATLAB) and for them size is an issue for interactive use. They
often manipulate very large arrays interactively. Furthermore, many
are astronomers who don't generally see themselves as programmers
and who may write programs (perhaps not great programs) don't
want to be bothered by such details even in a script (or they may
want to read a "professional" program and not have to deal with
such things). But you are right in that there is no solution that 
doesn't have some problems.

Every array language deals with this in somewhat different ways I
suspect. In IDL, the literals are generally smaller types (ints were
(or used to be, I haven't used it myself in a while) 2 bytes, floats
single precision) and there were ways of writing literals with higher
precision (e.g., 2L, 2.0d-2). Since it was a language specifically
intended to deal with numeric processing, supporting many scalar types
made sense.

Perry

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Perry G. <pe...@st...> - 2002-06-10 20:07:01

<Eric Jones writes>:
> I further believe that all Numeric functions (sum, product, etc.) should
> return arrays all the time instead of converting implicitly converting
> them to Python scalars in special cases such as reductions of 1d arrays.
> I think the only reason for the silent conversion is that Python lists
> only allow integer values for use in indexing so that:
> 
>  >>> a = [1,2,3,4]
>  >>> a[array(0)]
>  Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
>  TypeError: sequence index must be integer
> 
> Numeric arrays don't have this problem:
> 
>  >>> a = array([1,2,3,4])
>  >>> a[array(0)]
>  1
> 
> I don't think this alone is a strong enough reason for the conversion.
> Getting rid of special cases is more important because it makes behavior
> predictable to the novice (and expert), and it is easier to write
> generic functions and be sure they will not break a year from now when
> one of the special cases occurs.  
> 
> Are there other reasons why scalars are returned?
> 
Well, sure. It isn't just indexing lists directly, it would be
anywhere in Python that you would use a number. In some contexts,
the right thing may happen (where the function knows to try to obtain
a simple number from an object), but then again, it may not (if calling
a function where the number is used directly to index or slice).

Here is another case where good arguments can be made for both
sides. It really isn't an issue of functionality (one can write
methods or functions to do what is needed), it's what the convenient
syntax does. For example, if we really want a Python scalar but
rank-0 arrays are always returned then something like this may
be required:

>>> x = arange(10)
>>> a = range(10)
>>> a[scalar(x[2])] # instead of a[x[2]]

Whereas if simple indexing returns a Python scalar and consistency
is desired in always having arrays returned one may have to do
something like this

>>> y = x.indexAsArray(2) # instead of y = x[2]

or perhaps

>>> y = x[ArrayAlwaysAsResultIndexObject(2)] 
               # :-) with better name, of course

One context or the other is going to be inconvenienced, but not
prevented from doing what is needed.

As long as Python scalars are the 'biggest' type of their kind, we
strongly lean towards single elements being converted into Python
scalars. It's our feeling that there are more surprises and gotchas,
particularly for more casual users, on this side than on the uncertainty
of an index returning an array or scalar. People writing code that 
expects to deal with uncertain dimensionality (the only place that 
this occurs) should be the ones to go the extra distance in more
awkward syntax.

Perry

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: eric j. <er...@en...> - 2002-06-10 21:26:45

> <Eric Jones writes>:
> > I further believe that all Numeric functions (sum, product, etc.)
should
> > return arrays all the time instead of converting implicitly
converting
> > them to Python scalars in special cases such as reductions of 1d
arrays.
> > I think the only reason for the silent conversion is that Python
lists
> > only allow integer values for use in indexing so that:
> >
> >  >>> a = [1,2,3,4]
> >  >>> a[array(0)]
> >  Traceback (most recent call last):
> >    File "<stdin>", line 1, in ?
> >  TypeError: sequence index must be integer
> >
> > Numeric arrays don't have this problem:
> >
> >  >>> a = array([1,2,3,4])
> >  >>> a[array(0)]
> >  1
> >
> > I don't think this alone is a strong enough reason for the
conversion.
> > Getting rid of special cases is more important because it makes
behavior
> > predictable to the novice (and expert), and it is easier to write
> > generic functions and be sure they will not break a year from now
when
> > one of the special cases occurs.
> >
> > Are there other reasons why scalars are returned?
> >
> Well, sure. It isn't just indexing lists directly, it would be
> anywhere in Python that you would use a number. 

Travis seemed to indicate that the Python would convert 0-d arrays to
Python types correctly for most (all?) cases.  Python indexing is a
little unique because it explicitly requires integers. It's not just 0-d
arrays that fail as indexes -- Python floats won't work either.  

As for passing arrays to functions expecting numbers, is it that much
different than passing an integer into a function that does floating
point operations?  Python handles this casting automatically.  It seems
like is should do the same for 0-d arrays if they know how to "look
like" Python types.
 
> In some contexts,
> the right thing may happen (where the function knows to try to obtain
> a simple number from an object), but then again, it may not (if
calling
> a function where the number is used directly to index or slice).
> 
> Here is another case where good arguments can be made for both
> sides. It really isn't an issue of functionality (one can write
> methods or functions to do what is needed), it's what the convenient
> syntax does. For example, if we really want a Python scalar but
> rank-0 arrays are always returned then something like this may
> be required:
> 
> >>> x = arange(10)
> >>> a = range(10)
> >>> a[scalar(x[2])] # instead of a[x[2]]

Yes, this would be required for using them as array indexes.  Or
actually:

 >>> a[int(x[2])]

> 
> Whereas if simple indexing returns a Python scalar and consistency
> is desired in always having arrays returned one may have to do
> something like this
> 
> >>> y = x.indexAsArray(2) # instead of y = x[2]
> 
> or perhaps
> 
> >>> y = x[ArrayAlwaysAsResultIndexObject(2)]
>                # :-) with better name, of course
> 
> One context or the other is going to be inconvenienced, but not
> prevented from doing what is needed.

Right. 

> 
> As long as Python scalars are the 'biggest' type of their kind, we
> strongly lean towards single elements being converted into Python
> scalars. It's our feeling that there are more surprises and gotchas,
> particularly for more casual users, on this side than on the
uncertainty
> of an index returning an array or scalar. People writing code that
> expects to deal with uncertain dimensionality (the only place that
> this occurs) should be the ones to go the extra distance in more
> awkward syntax.

Well, I guess I'd like to figure out exactly what breaks before ruling
it out because consistently returning the same type from
functions/indexing is beneficial.  It becomes even more beneficial with
the exception behavior used by SciPy and numarray.  

The two breakage cases I'm aware of are (1) indexing and (2) functions
that explicitly check for arguments of IntType, DoubleType, or
ComplextType.  When searching the standard library for these guys, they
only turn up in copy, pickle, xmlrpclib, and the types module -- all in
innocuous ways.  Searching for 'float' (which is equal to FloatType)
doesn't show up any code that breaks this either.  A search of my
site-packages had IntType tests used quite a bit -- primarily in SciPy.
Some of these would go away with this change, and many were harmless.  I
saw a few that would need fixing (several in special.py), but the fix
was trivial.   

eric

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Perry G. <pe...@st...> - 2002-06-11 18:06:20

<Eric Jones wrote>:

> Travis seemed to indicate that the Python would convert 0-d arrays to
> Python types correctly for most (all?) cases.  Python indexing is a
> little unique because it explicitly requires integers. It's not just 0-d
> arrays that fail as indexes -- Python floats won't work either.  
> 
That's right, the primary breakage would be downstream use as 
indices. That appeared to be the case with the find() method
of strings for example.

> Yes, this would be required for using them as array indexes.  Or
> actually:
> 
>  >>> a[int(x[2])]
>
Yes, this would be sufficient for use as indices or slices. I'm not
sure if there is any specific code that checks for float but doesn't
invoke automatic conversion. I suspect that floats are much less of
a problem this way, though will one necessarily know whether to use
int(), float(), or scalar()? If one is writing a generic function that
could accept int or float arrays then the generation of a int may
be overpresuming what the result will be used for. (Though I don't
have a particular example to give, I'll think about whether any
exist). If the only type that could possibly cause problems is int,
then int() should be all that would be necessary, but still awkward.

Perry

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: eric j. <er...@en...> - 2002-06-11 18:37:32

> From: Perry Greenfield [mailto:pe...@st...]
> <Eric Jones wrote>:
> 
> > Travis seemed to indicate that the Python would convert 0-d arrays
to
> > Python types correctly for most (all?) cases.  Python indexing is a
> > little unique because it explicitly requires integers. It's not just
0-d
> > arrays that fail as indexes -- Python floats won't work either.
> >
> That's right, the primary breakage would be downstream use as
> indices. That appeared to be the case with the find() method
> of strings for example.
> 
> > Yes, this would be required for using them as array indexes.  Or
> > actually:
> >
> >  >>> a[int(x[2])]
> >
> Yes, this would be sufficient for use as indices or slices. I'm not
> sure if there is any specific code that checks for float but doesn't
> invoke automatic conversion. I suspect that floats are much less of
> a problem this way, though will one necessarily know whether to use
> int(), float(), or scalar()? If one is writing a generic function that
> could accept int or float arrays then the generation of a int may
> be overpresuming what the result will be used for. (Though I don't
> have a particular example to give, I'll think about whether any
> exist). If the only type that could possibly cause problems is int,
> then int() should be all that would be necessary, but still awkward.

If numarray becomes a first class citizen in the Python world as is
hoped, maybe even this issue can be rectified.  List/tuple indexing
might be able to be changed to accept single element Integer arrays.  I
suspect this has major implications though -- probably a question for
python-dev.

eric

Re: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Alexander S. <a.s...@gm...> - 2002-06-11 23:02:50

"eric jones" <er...@en...> writes:

> I think the consistency with Python is less of an issue than it seems.
> I wasn't aware that add.reduce(x) would generated the same results as
> the Python version of reduce(add,x) until Perry pointed it out to me.
> There are some inconsistencies between Python the language and Numeric
> because the needs of the Numeric community.  For instance, slices create
> views instead of copies as in Python.  This was a correct break with
> consistency in a very utilized area of Python because of efficiency.  

Ahh, a loaded example ;) I always thought that Numeric's view-slicing is a
fairly problematic deviation from standard Python behavior and I'm not
entirely sure why it needs to be done that way.

Couldn't one have both consistency *and* efficiency by implementing a
copy-on-demand scheme (which is what matlab does, if I'm not entirely
mistaken; a real copy gets only created if either the original or the 'copy'
is modified)? The current behavior seems not just problematic because it
breaks consistency and hence user expectations, it also breaks code that is
written with more pythonic sequences in mind (in a potentially hard to track
down manner) and is, IMHO generally undesirable and error-prone, for pretty
much the same reasons that dynamic scope and global variables are generally
undesirable and error-prone -- one can unwittingly create intricate
interactions between remote parts of a program that can be very difficult to
track down.

Obviously there *are* cases where one really wants a (partial) view of an
existing array. It would seem to me, however, that these cases are exceedingly
rare (In all my Numeric code I'm only aware of one instance where I actually
want the aliasing behavior, so that I can manipulate a large array by
manipulating its views and vice versa).  Thus rather than being the default
behavior, I'd rather see those cases accommodated by a special syntax that
makes it explicit that an alias is desired and that care must be taken when
modifying either the original or the view (e.g. one possible syntax would be
``aliased_vector = m.view[:,1]``).  Again I think the current behavior is
somewhat analogous to having variables declared in global (or dynamic) scope
by default which is not only error-prone, it also masks those cases where
global (or dynamic) scope *is* actually desired and necessary.

It might be that the problems associated with a copy-on-demand scheme
outweigh the error-proneness, the interface breakage that the deviation from
standard python slicing behavior causes, but otherwise copying on slicing
would be an backwards incompatibility in numarray I'd rather like to see
(especially since one could easily add a view attribute to Numeric, for
forwards-compatibility). I would also suspect that this would make it *a lot*
easier to get numarray (or parts of it) into the core, but this is just a
guess.

> 
> I don't see choosing axis=-1 as a break with Python -- multi-dimensional
> arrays are inherently different and used differently than lists of lists
> in Python.  Further, reduce() is a "corner" of the Python language that
> has been superceded by list comprehensions.  Choosing an alternative

Guido might nowadays think that adding reduce was as mistake, so in that sense
it might be a "corner" of the python language (although some people, including
me, still rather like using reduce), but I can't see how you can generally
replace reduce with anything but a loop. Could you give an example?

alex

-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.S...@gm...     http://www.dcs.ex.ac.uk/people/aschmolc/

RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: eric j. <er...@en...> - 2002-06-12 05:27:55

> "eric jones" <er...@en...> writes:
> 
> 
> > I think the consistency with Python is less of an issue than it
seems.
> > I wasn't aware that add.reduce(x) would generated the same results
as
> > the Python version of reduce(add,x) until Perry pointed it out to
me.
> > There are some inconsistencies between Python the language and
Numeric
> > because the needs of the Numeric community.  For instance, slices
create
> > views instead of copies as in Python.  This was a correct break with
> > consistency in a very utilized area of Python because of efficiency.
> 
> Ahh, a loaded example ;) I always thought that Numeric's view-slicing
is a
> fairly problematic deviation from standard Python behavior and I'm not
> entirely sure why it needs to be done that way.
> 
> Couldn't one have both consistency *and* efficiency by implementing a
> copy-on-demand scheme (which is what matlab does, if I'm not entirely
> mistaken; a real copy gets only created if either the original or the
> 'copy'
> is modified)? 

Well, slices creating copies is definitely a bad idea (which is what I
have heard proposed before) -- finite difference calculations (and
others) would be very slow with this approach.  Your copy-on-demand
suggestion might work though.  Its implementation would be more complex,
but I don't think it would require cooperation from the Python core.?
It could be handled in the ufunc code.  It would also require extension
modules to make copies before they modified any values.  

Copy-on-demand doesn't really fit with python's 'assignments are
references" approach to things though does it?  Using foo = bar in
Python and then changing an element of foo will also change bar.  So, I
guess there would have to be a distinction made here.  This adds a
little more complexity.

Personally, I like being able to pass views around because it allows for
efficient implementations.  The option to pass arrays into extension
function and edit them in-place is very nice.  Copy-on-demand might
allow for equal efficiency -- I'm not sure.

I haven't found the current behavior very problematic in practice and
haven't seen that it as a major stumbling block to new users.  I'm happy
with status quo on this. But, if copy-on-demand is truly efficient and
didn't make extension writing a nightmare, I wouldn't complain about the
change either.  I have a feeling the implementers of numarray would
though. :-)  And talk about having to modify legacy code...

> The current behavior seems not just problematic because it
> breaks consistency and hence user expectations, it also breaks code
that
> is
> written with more pythonic sequences in mind (in a potentially hard to
> track
> down manner) and is, IMHO generally undesirable and error-prone, for
> pretty
> much the same reasons that dynamic scope and global variables are
> generally
> undesirable and error-prone -- one can unwittingly create intricate
> interactions between remote parts of a program that can be very
difficult
> to
> track down.
> 
> Obviously there *are* cases where one really wants a (partial) view of
an
> existing array. It would seem to me, however, that these cases are
> exceedingly
> rare (In all my Numeric code I'm only aware of one instance where I
> actually
> want the aliasing behavior, so that I can manipulate a large array by
> manipulating its views and vice versa).  Thus rather than being the
> default
> behavior, I'd rather see those cases accommodated by a special syntax
that
> makes it explicit that an alias is desired and that care must be taken
> when
> modifying either the original or the view (e.g. one possible syntax
would
> be
> ``aliased_vector = m.view[:,1]``).  Again I think the current behavior
is
> somewhat analogous to having variables declared in global (or dynamic)
> scope
> by default which is not only error-prone, it also masks those cases
where
> global (or dynamic) scope *is* actually desired and necessary.
> 
> It might be that the problems associated with a copy-on-demand scheme
> outweigh the error-proneness, the interface breakage that the
deviation
> from
> standard python slicing behavior causes, but otherwise copying on
slicing
> would be an backwards incompatibility in numarray I'd rather like to
see
> (especially since one could easily add a view attribute to Numeric,
for
> forwards-compatibility). I would also suspect that this would make it
*a
> lot*
> easier to get numarray (or parts of it) into the core, but this is
just a
> guess.

I think the two things Guido wants for inclusion of numarray is a
consensus from our community on what we want, and (more importantly) a
comprehensible code base. :-)  If Numeric satisfied this 2nd condition,
it might already be slated for inclusion...  The 1st is never easy with
such varied opinions -- I've about concluded that Konrad and I are
anti-particles :-) -- but I hope it will happen. 

> 
> >
> > I don't see choosing axis=-1 as a break with Python --
multi-dimensional
> > arrays are inherently different and used differently than lists of
lists
> > in Python.  Further, reduce() is a "corner" of the Python language
that
> > has been superceded by list comprehensions.  Choosing an alternative
> 
> Guido might nowadays think that adding reduce was as mistake, so in
that
> sense
> it might be a "corner" of the python language (although some people,
> including
> me, still rather like using reduce), but I can't see how you can
generally
> replace reduce with anything but a loop. Could you give an example?

Your right.  You can't do it without a loop. List comprehensions only
supercede filter and map since they always return a list.

I think reduce is here to stay.  And, like you, I would actually be
disappointed to see it go (I like lambda too...)  The point is that I
wouldn't choose the definition of sum() or product() based on the
behavior of Python's reduce operator.  Hmmm. So I guess that is key --
its really these *function* interfaces that I disagree with.  

So, how about add.reduce() keep axis=0 to match the behavior of Python,
but sum() and friends defaulted to axis=-1 to match the rest of the
library functions?  It does break with consistency across the library,
so I think it is sub-optimal.  However, the distinction is reasonably
clear and much less likely to cause confusion.  It also allows FFT and
future modules (wavelets or whatever) operate across the fastest axis by
default while conforming to an intuitive standard. take() and friends
would also become axis=-1 for consistency with all other functions.
Would this be a reasonable compromise?

eric

> 
> 
> alex
> 
> --
> Alexander Schmolck     Postgraduate Research Student
>                        Department of Computer Science
>                        University of Exeter
> A.S...@gm...     http://www.dcs.ex.ac.uk/people/aschmolc/

Re: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Konrad H. <hi...@cn...> - 2002-06-12 08:54:16

"eric jones" <er...@en...> writes:

> others) would be very slow with this approach.  Your copy-on-demand
> suggestion might work though.  Its implementation would be more complex,
> but I don't think it would require cooperation from the Python core.?

It wouldn't, and I am not sure the implementation would be much more
complex, but then I haven't tried. Having both copy on demand and
views is difficult, both conceptually and implementationwise, but
with copy-on-demand, views become less important.

> Copy-on-demand doesn't really fit with python's 'assignments are
> references" approach to things though does it?  Using foo = bar in
> Python and then changing an element of foo will also change bar.  So, I

That would be true as well with copy-on-demand arrays, as foo and bar
would be the same object. Semantically, copy-on-demand would be
equivalent to copying when slicing, which is exactly Python's
behaviour for lists.

> So, how about add.reduce() keep axis=0 to match the behavior of Python,
> but sum() and friends defaulted to axis=-1 to match the rest of the

That sounds like the most arbitrary inconsistency - add.reduce and sum
are synonyms for me.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hi...@cn...
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

Re: FW: [Numpy-discussion] Bug: extremely misleading array behavior

From: Alexander S. <a.s...@gm...> - 2002-06-12 15:43:45

"eric jones" <er...@en...> writes:

> > Couldn't one have both consistency *and* efficiency by implementing a
> > copy-on-demand scheme (which is what matlab does, if I'm not entirely
> > mistaken; a real copy gets only created if either the original or the
> > 'copy'
> > is modified)? 
> 
> Well, slices creating copies is definitely a bad idea (which is what I
> have heard proposed before) -- finite difference calculations (and
> others) would be very slow with this approach.  Your copy-on-demand
> suggestion might work though.  Its implementation would be more complex,
> but I don't think it would require cooperation from the Python core.?
> It could be handled in the ufunc code.  It would also require extension
> modules to make copies before they modified any values.  
> 
> Copy-on-demand doesn't really fit with python's 'assignments are
> references" approach to things though does it?  Using foo = bar in
> Python and then changing an element of foo will also change bar.  So, I

My suggestion wouldn't conflict with any standard python behavior -- indeed
the main motivation would be to have numarray conform to standard python
behavior -- ``foo = bar`` and ``foo = bar[20:30]`` would behave exactly as for
other sequences in python. The first one creates an alias to bar and in the
second one the indexing operation creates a copy of part of the sequence which
is then aliased to foo. Sequences are atomic in python, in the sense that
indexing them creates a new object, which I think is not in contradiction to
python's nice and consistent 'assignments are references' behavior.

> guess there would have to be a distinction made here.  This adds a
> little more complexity.
> 
> Personally, I like being able to pass views around because it allows for
> efficient implementations.  The option to pass arrays into extension
> function and edit them in-place is very nice.  Copy-on-demand might
> allow for equal efficiency -- I'm not sure.

I don't know how much of a performance drawback copy-on-demand would have when
compared to views one -- I'd suspect it would be not significant, the fact
that the runtime behavior becomes a bit more difficult to predict might be
more of a drawback (but then I haven't heard matlab users complain and one
could always force an eager copy). Another reason why I think a copy-on-demand
scheme for slicing operations might be attractive is that I'd suspect one
could gain significant benefits from doing other operations in a lazy fashion
(plus optionally caching some results), too (transposing seems to cause in
principle unnecessary copies at least in some cases at the moment).

> 
> I haven't found the current behavior very problematic in practice and
> haven't seen that it as a major stumbling block to new users.  I'm happy

From my experience not even all people who use Numeric quite a lot are *aware*
that the slicing behavior differs from python sequences. You might be right
that in practice aliasing doesn't cause too many problems (as long as one
sticks to arrays -- it certainly makes it harder to write code that operates
on slices of generic sequence types) -- I'd really be interested to know
whether there are cases where people have spent a long time to track down a
bug caused by the view behavior.

> with status quo on this. But, if copy-on-demand is truly efficient and
> didn't make extension writing a nightmare, I wouldn't complain about the
> change either.  I have a feeling the implementers of numarray would
> though. :-)  And talk about having to modify legacy code...

Since the vast majorities of slicing operations are currently not done to
create views that are depedently modified, the backward incompatibility might
not affect that much code. You are right though, that if Perry and the other
numarray implementors don't think that copy-on-demand could be worthwhile the
bother then its unlikely to happen.

> 
> > forwards-compatibility). I would also suspect that this would make it
> *a
> > lot*
> > easier to get numarray (or parts of it) into the core, but this is
> just a
> > guess.
> 
> I think the two things Guido wants for inclusion of numarray is a
> consensus from our community on what we want, and (more importantly) a
> comprehensible code base. :-)  If Numeric satisfied this 2nd condition,
> it might already be slated for inclusion...  The 1st is never easy with
> such varied opinions -- I've about concluded that Konrad and I are
> anti-particles :-) -- but I hope it will happen. 

As I said I can only guess about the politics involved, but I would think that
before a significant piece of code such as numarray is incorporated into the
core a relevant pep will be discussed in the newsgroup and that many people
will feel more confortable about incorporating something into core-python that
doesn't deviate significantly from standard behavior (i.e. doesn't
view-slice), especially if it mainly caters to a rather specialized
audience. But Guido obviously has the last word on those issues and if he
doesn't have a problem either way than either way then as long as the
community is undivided it shouldn't be an obstacle for inclusion.

I agree that division of the community might pose the most significant
problems -- MA for example *does* create copies on indexing if I'm not
mistaken and the (desirable) transition process from Numeric to numarray also
poses not insignificant difficulties and risks, especially since there now are
quite a few important projects (not least of them scipy) that are build on top
of Numeric and will have to be incorporated in the transition if numarray is
to take over. Everything seems in a bit of a limbo right now. I'm currently
working on a (fully-featured) matrix class that I'd like to work with both
Numeric and numarray (and also scipy where available) more or less
transparently for the user, which turns out to be much more difficult than I
would have thought.

alex

-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.S...@gm...     http://www.dcs.ex.ac.uk/people/aschmolc/

[Numpy-discussion] copy on demand

From: Rick W. <rl...@st...> - 2002-06-12 16:26:28

Here is what I see as the fundamental problem with implementing slicing
in numarray using copy-on-demand instead views.

Copy-on-demand requires the maintenance of a global list of all the
active views associated with a particular array buffer.  Here is a
simple example:

    >>> a = zeros((5000,5000))
    >>> b = a[49:51,50]
    >>> c = a[51:53,50]
    >>> a[50,50] = 1

The assignment to a[50,50] must trigger a copy of the array b;
otherwise b also changes.  On the other hand, array c does not need to
be copied since its view does not include element 50,50.  You could
instead copy the array a -- but that means copying a 100 Mbyte array
while leaving the original around (since b and c are still using it) --
not a good idea!

The bookkeeping can get pretty messy (if you care about memory usage,
which we definitely do).  Consider this case:

    >>> a = zeros((5000,5000))
    >>> b = a[0:-10,0:-10]
    >>> c = a[49:51,50]
    >>> del a
    >>> b[50,50] = 1

Now what happens?  Either we can copy the array for b (which means two
copies of the huge (5000,5000) array exist, one used by c and the new
version used by b), or we can be clever and copy c instead.

Even keeping track of the views associated with a buffer doesn't solve
the problem of an array that is passed to a C extension and is modified
in place.  It would seem that passing an array into a C extension would
always require all the associated views to be turned into copies.
Otherwise we can't guarantee that views won't be modifed.

This kind of state information with side effects leads to a system that
is hard to develop, hard to debug, and really messes up the behavior of
the program (IMHO).  It is *highly* desirable to avoid it if possible.

This is not to deny that copy-on-demand (with explicit views available
on request) would have some desirable advantages for the behavior of
the system.  But we've worried these issues to death, and in the end
were convinced that slices == views provided the best compromise
between the desired behavior and a clean implementation.
				Rick

------------------------------------------------------------------
Richard L. White    rl...@st...    http://sundog.stsci.edu/rick/
Space Telescope Science Institute
Baltimore, MD

RE: [Numpy-discussion] copy on demand

From: Perry G. <pe...@st...> - 2002-06-12 16:44:40

<Rick White writes> :

> This kind of state information with side effects leads to a system that
> is hard to develop, hard to debug, and really messes up the behavior of
> the program (IMHO).  It is *highly* desirable to avoid it if possible.
> 
Rick beat me to the punch. The requirement for copy-on-demand
definitely leads to a far more complex implementation with 
much more potential for misunderstood memory usage. You could
do one small thing and suddenly force a spate of copies (perhaps
cascading). There is no way we would taken on a redesign of 
Numeric with this requirement with the resources we have available.

> This is not to deny that copy-on-demand (with explicit views available
> on request) would have some desirable advantages for the behavior of
> the system.  But we've worried these issues to death, and in the end
> were convinced that slices == views provided the best compromise
> between the desired behavior and a clean implementation.
> 
Rick's explanation doesn't really address the other position which
is slices should force immediate copies. This isn't a difficult
implementation issue by itself. But it does raise some related
implementation questions. Supposing one does feel that views are
a feature one wants even though they are not the default, it turns
out that it isn't all that simple to obtain views without sacrificing
ordinary slicing syntax to obtain a view. It is simple to obtain
copies of view slices though.

Slicing views may not be important to everyone. It is important
to us (and others) and we do see a number of situations where
forcing copies to operate on array subsets would be a serious
performance problem. We did discuss this issue with Guido and
he did not  indicate that having different behavior on slicing
with arrays would be a show stopper for acceptance into the
Standard Library. We are also aware that there is no great
consensus on this issue (even internally at STScI :-).

Perry Greenfield

Re: [Numpy-discussion] copy on demand

From: Alexander S. <a.s...@gm...> - 2002-06-12 22:50:25

"Perry Greenfield" <pe...@st...> writes:

> <Rick White writes> :
> 
> > This kind of state information with side effects leads to a system that
> > is hard to develop, hard to debug, and really messes up the behavior of
> > the program (IMHO).  It is *highly* desirable to avoid it if possible.
> > 
> Rick beat me to the punch. The requirement for copy-on-demand
> definitely leads to a far more complex implementation with 
> much more potential for misunderstood memory usage. You could
> do one small thing and suddenly force a spate of copies (perhaps
> cascading). There is no way we would taken on a redesign of 

Yes, but I would suspect that cases were a little innocuous a[0] = 3 triggers
excessive processing should be rather unusual (matlab or octave users will know).

> Numeric with this requirement with the resources we have available.

Fair enough -- if implementing copy-on-demand is too much work then we'll have
to live without it (especially if view-slicing doesn't stand in the way of a
future inclusion into the python core).

I guess the best reason to bite the bullet and carry around state information
would be if there were significant other cases where one also would want to
optimize operations under the hood. If there isn't much else in this direction
then the effort involved might not be justified. One thing that bugs me in
Numeric (and that might already have been solved in numarray) is that
e.g. ``ravel`` (and I think also ``transpose``) creates unnecessary copies,
whereas ``.flat`` doesn't, but won't work in all cases (viz. when the array is
non-contiguous), so I can either have ugly or inefficient code.

> 
> > This is not to deny that copy-on-demand (with explicit views available
> > on request) would have some desirable advantages for the behavior of
> > the system.  But we've worried these issues to death, and in the end
> > were convinced that slices == views provided the best compromise
> > between the desired behavior and a clean implementation.
> > 
> Rick's explanation doesn't really address the other position which
> is slices should force immediate copies. This isn't a difficult
> implementation issue by itself. But it does raise some related
> implementation questions. Supposing one does feel that views are
> a feature one wants even though they are not the default, it turns
> out that it isn't all that simple to obtain views without sacrificing
> ordinary slicing syntax to obtain a view. It is simple to obtain
> copies of view slices though.

I'm not sure I understand the above.  What is the problem with ``a.view[1:3]``
(or``a.view()[1:3])?

> 
> Slicing views may not be important to everyone. It is important
> to us (and others) and we do see a number of situations where
> forcing copies to operate on array subsets would be a serious
> performance problem. We did discuss this issue with Guido and

Sure, no one denies that even if with copy-on-demand (explicitly) aliased
views would still be useful.

> he did not  indicate that having different behavior on slicing
> with arrays would be a show stopper for acceptance into the
> Standard Library. We are also aware that there is no great
> consensus on this issue (even internally at STScI :-).
> 

Yep, I just saw Paul Barrett's post :)

> Perry Greenfield
> 
> 

alex
-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.S...@gm...     http://www.dcs.ex.ac.uk/people/aschmolc/

1 2 > >> (Page 1 of 2)