numpy-discussion Mailing List for Numerical Python (Page 364)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Francesc Alted va escriure:

> A Dimecres 26 Maig 2004 21:01, Perry Greenfield va escriure:
> > correct. You'd have to break apart the m1 tuple and
> > index all the components, e.g.,
> > 
> > m11, m12 = m1
> > x[m11[m2],m12[m2]] = ...
> > 
> > This gets clumsier with the more dimensions that must
> > be handled, but you still can do it. It would be most
> > useful if the indexed array is very large, the number
> > of items selected is relatively small and one
> > doesn't want to incur the memory overhead of all the
> > mask arrays of the admittedly much nicer notational
> > approach that Francesc illustrated.
> 
> Well, boolean arrays have the property that they use very little memory
> (only 1 byte / element), and normally perform quite well doing indexing.
> Some timings:
> 
> >>> import timeit
> >>> t1 = 
> timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2]
> ,m12[m2]]","from numarray import 
> arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)")
> >>> t2 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import 
> arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)")
> >>> t1.repeat(3,1000)
> [3.1320240497589111, 3.1235389709472656, 3.1198310852050781]
> >>> t2.repeat(3,1000)
> [1.1218469142913818, 1.117638111114502, 1.1156759262084961]
> 
> i.e. using boolean arrays for indexing is roughly 3 times faster.
> 
> For larger arrays this difference is even more noticeable:
> 
> >>> t3 = 
> timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2]
> ,m12[m2]]","from numarray import 
> arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)")
> >>> t4 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import 
> arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)")
> >>> t3.repeat(3,10)
> [3.1818649768829346, 3.20477294921875, 3.190640926361084]
> >>> t4.repeat(3,10)
> [0.42328095436096191, 0.42140507698059082, 0.41979002952575684]
> 
> as you see, now the difference is almost an order of magnitude (!).
> 
> So, perhaps assuming the small memory overhead, in most of cases it is
> better to use boolean selections. However, it would be nice to know the
> ultimate reason of why this happens, because the Perry approach seems
> intuitively faster.
>
Yes I agree. It was good of you to post these timings. I don't 
think we had actually compared the two approaches though the
results don't surprise me (though I suspect the results may change
if the first mask has a very small percentage of elements; the
large timing test has nearly all elements selected for the first
mask).

Perry

2000	Jan (8)	Feb (49)	Mar (48)	Apr (28)	May (37)	Jun (28)	Jul (16)	Aug (16)	Sep (44)	Oct (61)	Nov (31)	Dec (24)
2001	Jan (56)	Feb (54)	Mar (41)	Apr (71)	May (48)	Jun (32)	Jul (53)	Aug (91)	Sep (56)	Oct (33)	Nov (81)	Dec (54)
2002	Jan (72)	Feb (37)	Mar (126)	Apr (62)	May (34)	Jun (124)	Jul (36)	Aug (34)	Sep (60)	Oct (37)	Nov (23)	Dec (104)
2003	Jan (110)	Feb (73)	Mar (42)	Apr (8)	May (76)	Jun (14)	Jul (52)	Aug (26)	Sep (108)	Oct (82)	Nov (89)	Dec (94)
2004	Jan (117)	Feb (86)	Mar (75)	Apr (55)	May (75)	Jun (160)	Jul (152)	Aug (86)	Sep (75)	Oct (134)	Nov (62)	Dec (60)
2005	Jan (187)	Feb (318)	Mar (296)	Apr (205)	May (84)	Jun (63)	Jul (122)	Aug (59)	Sep (66)	Oct (148)	Nov (120)	Dec (70)
2006	Jan (460)	Feb (683)	Mar (589)	Apr (559)	May (445)	Jun (712)	Jul (815)	Aug (663)	Sep (559)	Oct (930)	Nov (373)	Dec

numpy-discussion Mailing List for Numerical Python (Page 364)

A package for scientific computing with Python

numpy-discussion — Discussion list for all users of Numerical Python