Re: [Numpy-discussion] max argmax combo

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

A Dimarts 19 Setembre 2006 21:41, Bill Baxter va escriure:
> I think he meant do an argsort first, then use fancy indexing to get
> the sorted array.
> For a 1-d array that's just
>
>   ind =3D A.argsort()
>   Asorted =3D A[ind]
>
> That should be O(N lg N + N), aka O(N lg N)

I see. Thanks.  OTOH, maybe your estimations are right, but the effect of t=
he=20
constants in these O(whatever) estimations can be surprisingly high:

In [3]: from timeit import Timer

In [4]: Timer("b=3Dnumpy.argsort(a);c=3Da[b]", "import numpy;=20
a=3Dnumpy.arange(100000,-1,-1)").repeat(3,100)
Out[4]: [1.6653108596801758, 1.670341968536377, 1.6632120609283447]

In [5]: Timer("b=3Dnumpy.argsort(a);c=3Dnumpy.sort(a)", "import numpy;=20
a=3Dnumpy.arange(100000,-1,-1)").repeat(3,100)
Out[5]: [1.6533238887786865, 1.6272940635681152, 1.6253311634063721]

In [6]: Timer("b=3Dnumpy.argsort(a);a.sort();c=3Da", "import numpy;=20
a=3Dnumpy.arange(100000,-1,-1)").repeat(3,100)
Out[6]: [0.95492100715637207, 0.90312504768371582, 0.90426898002624512]

so, it seems that argsorting first and fancy indexing later on is the most=
=20
expensive procedure for relatively large arrays (100000).

Interestingly, figures above seems to indicate that in-place sort is=20
stunningly fast:

In [7]: Timer("a.sort()","import numpy;=20
a=3Dnumpy.arange(100000,-1,-1)").repeat(3,100)
Out[7]: [0.32840394973754883, 0.2746579647064209, 0.2770991325378418]

and much faster indeed than fancy indexing

In [8]: Timer("b[a]","import numpy;=20
a=3Dnumpy.arange(100000,-1,-1);b=3Da.copy()").repeat(3,100)
Out[8]: [0.79876089096069336, 0.74172186851501465, 0.74209499359130859]

i.e. in-place sort seems 2.7x faster than fancy indexing (at least for thes=
e=20
datasets).

Mmmm, with this, I really ponder if a combo that makes the argsort() and=20
sort() in one shot really makes any sense, at least from the point of view =
of=20
speed:

In [10]: Timer("b=3Dnumpy.argsort(a);a.sort();c=3Da","import numpy;=20
a=3Dnumpy.arange(100000,-1,-1)").repeat(3,100)
Out[10]: [0.98506593704223633, 0.89880609512329102, 0.89982390403747559]

In [11]: Timer("b=3Dnumpy.argsort(a)","import numpy;=20
a=3Dnumpy.arange(100000,-1,-1)").repeat(3,100)
Out[11]: [0.92959284782409668, 0.85385990142822266, 0.87773990631103516]

So, it seems that doing an in-place sort() immediately after an argsort()=20
operation is very efficient (cache effects here?), and would avoid the need=
=20
of the combo function (from the point of view of efficiency, I repeat).

Cheers,

=2D-=20
>0,0<   Francesc Altet =A0 =A0 http://www.carabos.com/
V   V   C=E1rabos Coop. V. =A0=A0Enjoy Data
 "-"

Re: [Numpy-discussion] max argmax combo

A package for scientific computing with Python

Re: [Numpy-discussion] max argmax combo