Thread: [Numpy-discussion] Selecting columns of a matrix

A package for scientific computing with Python

Brought to you by: charris208, jarrodmillman, kern, rgommers, teoliphant

numpy-discussion

[Numpy-discussion] Selecting columns of a matrix

From: Keith G. <kwg...@gm...> - 2006-06-21 03:04:27

I have a matrix M and a vector (n by 1 matrix) V. I want to form a new
matrix that contains the columns of M for which V > 0.

One way to do that in Octave is M(:, find(V > 0)). How is it done in numpy?

Re: [Numpy-discussion] Selecting columns of a matrix

From: Bill B. <wb...@gm...> - 2006-06-21 03:33:46

I think that one's on the NumPy for Matlab users, no?
http://www.scipy.org/NumPy_for_Matlab_Users

>>> import numpy as num
>>> a = num.arange (10).reshape(2,5)
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> v = num.rand(5)
>>> v
array([ 0.10934855,  0.55719644,  0.7044047 ,  0.19250088,  0.94636972])
>>> num.where(v>0.5)
(array([1, 2, 4]),)
>>> a[:,num.where(v>0.5)]
array([[[1, 2, 4]],

       [[6, 7, 9]]])

Seems it grows an extra set of brackets for some reason.  Squeeze will get
rid of them.

>>> a[:,num.where(v>0.5)].squeeze()
array([[1, 2, 4],
       [6, 7, 9]])

Not sure why the squeeze is needed.  Maybe there's a better way.

--bb


On 6/21/06, Keith Goodman <kwg...@gm...> wrote:
>
> I have a matrix M and a vector (n by 1 matrix) V. I want to form a new
> matrix that contains the columns of M for which V > 0.
>
> One way to do that in Octave is M(:, find(V > 0)). How is it done in
> numpy?
>
>
>

Re: [Numpy-discussion] Selecting columns of a matrix

From: Keith G. <kwg...@gm...> - 2006-06-21 03:49:30

On 6/20/06, Bill Baxter <wb...@gm...> wrote:
> I think that one's on the NumPy for Matlab users, no?
>
> http://www.scipy.org/NumPy_for_Matlab_Users
>
> >>> import numpy as num
>  >>> a = num.arange (10).reshape(2,5)
> >>> a
> array([[0, 1, 2, 3, 4],
>        [5, 6, 7, 8, 9]])
> >>> v = num.rand(5)
> >>> v
> array([ 0.10934855,  0.55719644,  0.7044047 ,  0.19250088,  0.94636972])
>  >>> num.where(v>0.5)
> (array([1, 2, 4]),)
> >>> a[:,num.where(v>0.5)]
> array([[[1, 2, 4]],
>
>        [[6, 7, 9]]])
>
> Seems it grows an extra set of brackets for some reason.  Squeeze will get
> rid of them.
>
> >>> a[:,num.where(v>0.5)].squeeze()
> array([[1, 2, 4],
>        [6, 7, 9]])
>
> Not sure why the squeeze is needed.  Maybe there's a better way.

Thank you.

That works for arrays, but not matrices. So do I need to do

asarray(a)[:, where(asarray(v)>0.5)].squeeze()

?

Re: [Numpy-discussion] Selecting columns of a matrix

From: Erin S. <eri...@gm...> - 2006-06-21 04:10:11

On 6/20/06, Bill Baxter <wb...@gm...> wrote:
> I think that one's on the NumPy for Matlab users, no?
>
> http://www.scipy.org/NumPy_for_Matlab_Users
>
> >>> import numpy as num
>  >>> a = num.arange (10).reshape(2,5)
> >>> a
> array([[0, 1, 2, 3, 4],
>        [5, 6, 7, 8, 9]])
> >>> v = num.rand(5)
> >>> v
> array([ 0.10934855,  0.55719644,  0.7044047 ,  0.19250088,  0.94636972])
>  >>> num.where(v>0.5)
> (array([1, 2, 4]),)
> >>> a[:,num.where(v>0.5)]
> array([[[1, 2, 4]],
>
>        [[6, 7, 9]]])
>
> Seems it grows an extra set of brackets for some reason.  Squeeze will get
> rid of them.
>
> >>> a[:,num.where(v>0.5)].squeeze()
> array([[1, 2, 4],
>        [6, 7, 9]])
>
> Not sure why the squeeze is needed.  Maybe there's a better way.

where returns a tuple of arrays.  This can have unexpected results
so you need to grab what you want explicitly:

>>> (w,) = num.where(v>0.5)
>>> a[:,w]
array([[1, 2, 4],
       [6, 7, 9]])

Re: [Numpy-discussion] Selecting columns of a matrix

From: Bill B. <wb...@gm...> - 2006-06-21 04:48:51

On 6/21/06, Erin Sheldon <eri...@gm...> wrote:
>
> On 6/20/06, Bill Baxter <wb...@gm...> wrote:
> > I think that one's on the NumPy for Matlab users, no?
> >
> > http://www.scipy.org/NumPy_for_Matlab_Users
> >
> > >>> import numpy as num
> >  >>> a = num.arange (10).reshape(2,5)
> > >>> a
> > array([[0, 1, 2, 3, 4],
> >        [5, 6, 7, 8, 9]])
> > >>> v = num.rand(5)
> > >>> v
> > array([ 0.10934855,  0.55719644,  0.7044047 ,  0.19250088,  0.94636972])
> >  >>> num.where(v>0.5)
> > (array([1, 2, 4]),)
> > >>> a[:,num.where(v>0.5)]
> > array([[[1, 2, 4]],
> >
> >        [[6, 7, 9]]])
> >
> > Seems it grows an extra set of brackets for some reason.  Squeeze will
> get
> > rid of them.
> >
> > >>> a[:,num.where(v>0.5)].squeeze()
> > array([[1, 2, 4],
> >        [6, 7, 9]])
> >
> > Not sure why the squeeze is needed.  Maybe there's a better way.
>
> where returns a tuple of arrays.  This can have unexpected results
> so you need to grab what you want explicitly:
>
> >>> (w,) = num.where(v>0.5)
> >>> a[:,w]
> array([[1, 2, 4],
>        [6, 7, 9]])
>

Ah, yeh, that makes sense.  Thanks for the explanation.  So to turn it back
into a one-liner you just need:

>>> a[:,num.where(v>0.5)[0]]
array([[1, 2, 4],
       [6, 7, 9]])

I'll put that up on the Matlab->Numpy page.

--bb

Re: [Numpy-discussion] Selecting columns of a matrix

From: Simon B. <si...@ar...> - 2006-06-21 05:23:55

On Wed, 21 Jun 2006 13:48:48 +0900
"Bill Baxter" <wb...@gm...> wrote:

> 
> >>> a[:,num.where(v>0.5)[0]]
> array([[1, 2, 4],
>        [6, 7, 9]])
> 
> I'll put that up on the Matlab->Numpy page.

oh, yuck. What about this:

>>> a[:,num.nonzero(v>0.5)]
array([[0, 1, 3],
       [5, 6, 8]])
>>> 

Simon.


-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com

Re: [Numpy-discussion] Selecting columns of a matrix

From: Keith G. <kwg...@gm...> - 2006-06-21 14:14:24

On 6/20/06, Bill Baxter <wb...@gm...> wrote:

> >>> a[:,num.where(v>0.5)[0]]
> array([[1, 2, 4],
>        [6, 7, 9]])
>
> I'll put that up on the Matlab->Numpy page.

That's a great addition to the Matlab to Numpy page.

But it only works if v is a column vector. If v is a row vector, then
where(v.A > 0.5)[0] will return all zeros. So for row vectors it
should be where(v.A > 0.5)[1].

Or, in general, where(v.flatten(1).A > 0.5)[1]

Re: [Numpy-discussion] Selecting columns of a matrix

From: Bill B. <wb...@gm...> - 2006-06-21 07:17:24

On 6/21/06, Simon Burton <si...@ar...> wrote:
>
> On Wed, 21 Jun 2006 13:48:48 +0900
> "Bill Baxter" <wb...@gm...> wrote:
>
> >
> > >>> a[:,num.where(v>0.5)[0]]
> > array([[1, 2, 4],
> >        [6, 7, 9]])
> >
> > I'll put that up on the Matlab->Numpy page.
>
> oh, yuck. What about this:
>
> >>> a[:,num.nonzero(v>0.5)]
> array([[0, 1, 3],
>        [5, 6, 8]])
> >>>

The nonzero() function seems like kind of an anomaly in and of itself.    It
doesn't behave like other index-returning numpy functions, or even like the
method version, v.nonzero(), which returns the typical tuple of array.  So
my feeling is ... ew to numpy.nonzero.

--Bill

Re: [Numpy-discussion] Selecting columns of a matrix

From: Alan G I. <ai...@am...> - 2006-06-21 09:13:59

On Wed, 21 Jun 2006, Bill Baxter apparently wrote: 
> ew to numpy.nonzero 

I agree that having the method and function behave so 
differently is awkward; this was discussed before on this 
list.

It does allow Simon's nicer solution, however.

I'm not sure why bool arrays cannot be used as indices.
The "natural" solution to the original problem seemed to be:
M[:,V>0]
but this is not allowed.

Cheers,
Alan Isaac

Re: [Numpy-discussion] Selecting columns of a matrix

From: Johannes L. <a.u...@gm...> - 2006-06-21 13:36:25

Hi,

> I'm not sure why bool arrays cannot be used as indices.
> The "natural" solution to the original problem seemed to be:
> M[:,V>0]
> but this is not allowed.

I started a thread on this earlier this year. Try searching the archive for 
"boolean indexing" (if it comes back online somewhen).

Travis had some reason for not implementing this, but unfortunately I do not 
remember what it was. The corresponding message might still linger on my home 
PC, which I can access this evening....

Johannes

Re: [Numpy-discussion] Selecting columns of a matrix

From: Travis O. <oli...@ie...> - 2006-06-21 16:50:34

Johannes Loehnert wrote:
> Hi,
>
>   
>> I'm not sure why bool arrays cannot be used as indices.
>> The "natural" solution to the original problem seemed to be:
>> M[:,V>0]
>> but this is not allowed.
>>     
>
> I started a thread on this earlier this year. Try searching the archive for 
> "boolean indexing" (if it comes back online somewhen).
>
> Travis had some reason for not implementing this, but unfortunately I do not 
> remember what it was. The corresponding message might still linger on my home 
>   
> PC, which I can access this evening....
>   

I suspect my reason was just not being sure if it could be explained 
consistently.  But, after seeing this come up again.   I decided it was 
easy enough to implement.

So, in SVN NumPy, you will be able to do

a[:,V>0]
a[V>0,:] 

The V>0 will be replaced with integer arrays as if nonzero(V>0) had been 
called.


-Travis

Re: [Numpy-discussion] Selecting columns of a matrix

From: Pau G. <pau...@gm...> - 2006-06-21 17:09:54

On 6/21/06, Travis Oliphant <oli...@ie...> wrote:
> Johannes Loehnert wrote:
> > Hi,
> >
> >
> >> I'm not sure why bool arrays cannot be used as indices.
> >> The "natural" solution to the original problem seemed to be:
> >> M[:,V>0]
> >> but this is not allowed.
> >>
> >
> > I started a thread on this earlier this year. Try searching the archive for
> > "boolean indexing" (if it comes back online somewhen).
> >
> > Travis had some reason for not implementing this, but unfortunately I do not
> > remember what it was. The corresponding message might still linger on my home
> >
> > PC, which I can access this evening....
> >
>
> I suspect my reason was just not being sure if it could be explained
> consistently.  But, after seeing this come up again.   I decided it was
> easy enough to implement.
>
> So, in SVN NumPy, you will be able to do
>
> a[:,V>0]
> a[V>0,:]
>
> The V>0 will be replaced with integer arrays as if nonzero(V>0) had been
> called.
>

does it work for a[<boolean>,<boolean>] ?

what about a[ix_( nonzero(<boolean>), nonzero(<boolean>) )] ?

maybe the <boolean> to nonzero(<boolean>) conversion would be more
coherently done by the ix_ function than by the []


pau

Re: [Numpy-discussion] Selecting columns of a matrix

From: Simon B. <si...@ar...> - 2006-06-22 02:20:28

On Wed, 21 Jun 2006 10:50:26 -0600
Travis Oliphant <oli...@ie...> wrote:

> 
> So, in SVN NumPy, you will be able to do
> 
> a[:,V>0]
> a[V>0,:] 
> 
> The V>0 will be replaced with integer arrays as if nonzero(V>0) had been 
> called.

OK.
But just for the record, we should note how to
do the operation that this used to do, eg.

>>> a=numpy.array([1,2])
>>> a[[numpy.bool_(1)]]
array([2])
>>> 

This could be a way of, say, maping a large
boolean array onto some other values (1 or 2 in the
above case).

So, with the new implementation, is it possible to cast
the bool array to an integer type without incurring a copy overhead ?

And finally, is someone keeping track of the performance
of array getitem ? It seems that as travis overloads it more and
more it might then slow down in some cases.

I must admit my vision is blurring and head is spining as numpy 
goes through these growing pains. I hope it's over soon. Not
because I have trouble keeping up (although i do) but it's
my matlab/R/numarray entrenched co-workers who cannot
be exposed to this unstable development (they will run
screaming to the woods).

cheers,

Simon.

-- 
Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Australia
Ph. 61 02 6249 6940
http://arrowtheory.com

Re: [Numpy-discussion] Selecting columns of a matrix

From: Travis O. <oli...@ie...> - 2006-06-22 05:58:55

Simon Burton wrote:
> On Wed, 21 Jun 2006 10:50:26 -0600
> Travis Oliphant <oli...@ie...> wrote:
>
>   
>> So, in SVN NumPy, you will be able to do
>>
>> a[:,V>0]
>> a[V>0,:] 
>>
>> The V>0 will be replaced with integer arrays as if nonzero(V>0) had been 
>> called.
>>     
>
> OK.
> But just for the record, we should note how to
> do the operation that this used to do, eg.
>
>   
>>>> a=numpy.array([1,2])
>>>> a[[numpy.bool_(1)]]
>>>>         
> array([2]
>   
This behavior hasn't changed...

All that's changed is that what used to raise an error (boolean arrays 
in a tuple) now works in the same way that boolean arrays worked before.
>
> So, with the new implementation, is it possible to cast
> the bool array to an integer type without incurring a copy overhead ?
>   

I'm not sure what you mean.  What copy overhead?   There is still 
copying going on.  The way it's been implemented, the boolean arrays get 
replaced with integer index arrays under the hood so it is really nearly 
identical to replacing the boolean array with nonzero(<boolean>).
> And finally, is someone keeping track of the performance
> of array getitem ? It seems that as travis overloads it more and
> more it might then slow down in some cases.
>   
Actually, I'm very concientious of the overhead of getitem in code that 
I add.  I just today found a memory leak in code that was added that I 
did not review carefully that was also slowing down all accesses of 
arrays > 1d that resulted in array scalars.  I added an optimization 
that should speed that up.

But, it would be great if others could watch the speed changes for basic 
operations.
> I must admit my vision is blurring and head is spining as numpy 
> goes through these growing pains
The 1.0 beta release is coming shortly.   I would like to see the first 
beta by the first of July.   The final 1.0 release won't occur, though, 
until after SciPy 2006.

Thanks for your patience.   We've been doing a lot of house-cleaning 
lately to separate the "old but compatible" interface from the "new."   
This has resulted in some confusion, to be sure.   Please don't hesitate 
to voice your concerns.

-Travis

Re: [Numpy-discussion] Selecting columns of a matrix

From: Travis O. <oli...@ie...> - 2006-06-21 16:09:57

Bill Baxter wrote:
> On 6/21/06, *Simon Burton* <si...@ar... 
> <mailto:si...@ar...>> wrote:
>
>     On Wed, 21 Jun 2006 13:48:48 +0900
>     "Bill Baxter" <wb...@gm... <mailto:wb...@gm...>> wrote:
>
>     >
>     > >>> a[:,num.where(v>0.5)[0]]
>     > array([[1, 2, 4],
>     >        [6, 7, 9]])
>     >
>     > I'll put that up on the Matlab->Numpy page.
>
>     oh, yuck. What about this:
>
>     >>> a[:,num.nonzero(v>0.5)]
>     array([[0, 1, 3],
>            [5, 6, 8]])
>     >>> 
>
>
> The nonzero() function seems like kind of an anomaly in and of 
> itself.    It doesn't behave like other index-returning numpy 
> functions, or even like the method version, v.nonzero(), which returns 
> the typical tuple of array.  So my feeling is ... ew to numpy.nonzero.

How about we add the ability so that

a[:, <boolean>]  gets translated to

a[:, nonzero(<boolean>)] ?

-Travis

Re: [Numpy-discussion] Selecting columns of a matrix

From: Alan G I. <ai...@am...> - 2006-06-21 08:40:55

On Tue, 20 Jun 2006, Keith Goodman apparently wrote:=20
> I have a matrix M and a vector (n by 1 matrix) V. I want to form a new=20
> matrix that contains the columns of M for which V > 0.=20
> One way to do that in Octave is M(:, find(V > 0)). How is it done in nump=
y?=20

M.transpose()[V>0]
If you want the columns as columns,
you can transpose again.

hth,
Alan Isaac

Re: [Numpy-discussion] Selecting columns of a matrix

From: Travis O. <oli...@ie...> - 2006-06-21 17:27:21

Pau Gargallo wrote:
> On 6/21/06, Travis Oliphant <oli...@ie...> wrote:
>   
>> Johannes Loehnert wrote:
>>     
>>> Hi,
>>>
>>>
>>>       
>>>> I'm not sure why bool arrays cannot be used as indices.
>>>> The "natural" solution to the original problem seemed to be:
>>>> M[:,V>0]
>>>> but this is not allowed.
>>>>
>>>>         
>>> I started a thread on this earlier this year. Try searching the archive for
>>> "boolean indexing" (if it comes back online somewhen).
>>>
>>> Travis had some reason for not implementing this, but unfortunately I do not
>>> remember what it was. The corresponding message might still linger on my home
>>>
>>> PC, which I can access this evening....
>>>
>>>       
>> I suspect my reason was just not being sure if it could be explained
>> consistently.  But, after seeing this come up again.   I decided it was
>> easy enough to implement.
>>
>> So, in SVN NumPy, you will be able to do
>>
>> a[:,V>0]
>> a[V>0,:]
>>
>> The V>0 will be replaced with integer arrays as if nonzero(V>0) had been
>> called.
>>
>>     
>
> does it work for a[<boolean>,<boolean>] ?
>   
Sure, it will work.  Basically all boolean arrays will be interpreted as 
nonzero(V>0), everywhere.
> what about a[ix_( nonzero(<boolean>), nonzero(<boolean>) )] ?
>
> maybe the <boolean> to nonzero(<boolean>) conversion would be more
> coherently done by the ix_ function than by the []
>
>   
I've just added support for <boolean> inside ix_  so that the nonzero 
will be done automatically as well.  

So

a[ix_(<boolean>,<boolean>)]  will give the cross-product selection.


-Travis

Re: [Numpy-discussion] Selecting columns of a matrix

From: Pau G. <pau...@gm...> - 2006-06-21 17:31:50

On 6/21/06, Travis Oliphant <oli...@ie...> wrote:
> Pau Gargallo wrote:
> > On 6/21/06, Travis Oliphant <oli...@ie...> wrote:
> >
> >> Johannes Loehnert wrote:
> >>
> >>> Hi,
> >>>
> >>>
> >>>
> >>>> I'm not sure why bool arrays cannot be used as indices.
> >>>> The "natural" solution to the original problem seemed to be:
> >>>> M[:,V>0]
> >>>> but this is not allowed.
> >>>>
> >>>>
> >>> I started a thread on this earlier this year. Try searching the archive for
> >>> "boolean indexing" (if it comes back online somewhen).
> >>>
> >>> Travis had some reason for not implementing this, but unfortunately I do not
> >>> remember what it was. The corresponding message might still linger on my home
> >>>
> >>> PC, which I can access this evening....
> >>>
> >>>
> >> I suspect my reason was just not being sure if it could be explained
> >> consistently.  But, after seeing this come up again.   I decided it was
> >> easy enough to implement.
> >>
> >> So, in SVN NumPy, you will be able to do
> >>
> >> a[:,V>0]
> >> a[V>0,:]
> >>
> >> The V>0 will be replaced with integer arrays as if nonzero(V>0) had been
> >> called.
> >>
> >>
> >
> > does it work for a[<boolean>,<boolean>] ?
> >
> Sure, it will work.  Basically all boolean arrays will be interpreted as
> nonzero(V>0), everywhere.
> > what about a[ix_( nonzero(<boolean>), nonzero(<boolean>) )] ?
> >
> > maybe the <boolean> to nonzero(<boolean>) conversion would be more
> > coherently done by the ix_ function than by the []
> >
> >
> I've just added support for <boolean> inside ix_  so that the nonzero
> will be done automatically as well.
>
> So
>
> a[ix_(<boolean>,<boolean>)]  will give the cross-product selection.
>


ok so: a[ b1, b2 ] will be different than a[ ix_(b1,b2) ] just like
with integer indices.
Make sense to me.

also, a[b] will be as before (a[where(b)]) ?
maybe a trailing coma could lunch the new behaviour?
a[b] -> a[where(b)]
a[b,] -> a[b,...] -> a[nonzero(b)]

Thanks,

pau

Re: [Numpy-discussion] Selecting columns of a matrix

From: Pau G. <pau...@gm...> - 2006-06-22 10:26:20

'''
The following mail is a bit long and tedious to read, sorry about
that. Here is the abstract:
   "I would like boolean indexing to work like slices and not like
arrays of indices"
'''


hi,

I'm _really_ sorry to insist, but I have been thinking on it and I
don't feel like replacing <boolean> with nonzero(<boolean>) is what we
want.

For me this is a bad trick equivalent to replacing slices to arrays of
indices with r_[<slice>]:
- it works only if you do that for a single axis.

Let me explain:
if i have an array,

>>> from numpy import *
>>> a = arange(12).reshape(3,4)

i can slice it:

>>> a[1:3,0:3]
array([[ 4,  5,  6],
       [ 8,  9, 10]])

i can define boolean arrays 'equivalent' to this slices

>>> b1 = array([False,True,True])             # equivalent to 1:3
>>> b2 = array([True,True,True,False])      # equivalent to 0:3

now if i use one of this boolean arrays for indexing, all work like with slices:

>>> a[b1,:]                     #same as a[1:3,:]
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> a[:,b2]                     # same as a[:,0:3]
array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10]])

but if I use both at the same time:

>>> a[b1,b2]                 # not equivalent to a[1:3,0:3] but to
a[r_[1:3],r_[0:3]]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: shape mismatch: objects cannot be broadcast to a single shape

it doesn't work because nonzero(b1) and nonzero(b2) have different shapes.
if I want the equivalent to a[1:3,1:3], i can do

>>> a[ix_(b1,b2)]
array([[ 4,  5,  6],
       [ 8,  9, 10]])

I can not see when the current behaviour of a[b1,b2] would be used.
>From my (probably naive) point of view, <boolean> should not be
converted to nonzero(<boolean>), but to some kind of slicing object.
In that way boolean indexing could work like slices and not like
arrays of integers, which will be more intuitive for me.

Converting slices to arrays of indices is a trick that only works for one axis:

>>> a[r_[1:3],0:3]           #same as a[1:3,0:3]
array([[ 4,  5,  6],
       [ 8,  9, 10]])
>>> a[1:3,r_[0:3]]            #same as a[1:3,0:3]
array([[ 4,  5,  6],
       [ 8,  9, 10]])
>>> a[r_[1:3],r_[0:3]]       # NOT same as a[1:3,0:3]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: shape mismatch: objects cannot be broadcast to a single shape


am I completly wrong??
may be the current behaviour (only usefull for one axis) is enought??

sorry for asking things and not giving solutions and thanks for everything.

pau


PD: I noticed that the following code works
>>> a[a>4,:,:,:,:,1:2:3,...,4:5:6]
array([ 5,  6,  7,  8,  9, 10, 11])