From: Perry G. <pe...@st...> - 2004-05-27 17:47:22
|
Francesc Alted va escriure: > A Dimecres 26 Maig 2004 21:01, Perry Greenfield va escriure: > > correct. You'd have to break apart the m1 tuple and > > index all the components, e.g., > > > > m11, m12 = m1 > > x[m11[m2],m12[m2]] = ... > > > > This gets clumsier with the more dimensions that must > > be handled, but you still can do it. It would be most > > useful if the indexed array is very large, the number > > of items selected is relatively small and one > > doesn't want to incur the memory overhead of all the > > mask arrays of the admittedly much nicer notational > > approach that Francesc illustrated. > > Well, boolean arrays have the property that they use very little memory > (only 1 byte / element), and normally perform quite well doing indexing. > Some timings: > > >>> import timeit > >>> t1 = > timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2] > ,m12[m2]]","from numarray import > arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") > >>> t2 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import > arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") > >>> t1.repeat(3,1000) > [3.1320240497589111, 3.1235389709472656, 3.1198310852050781] > >>> t2.repeat(3,1000) > [1.1218469142913818, 1.117638111114502, 1.1156759262084961] > > i.e. using boolean arrays for indexing is roughly 3 times faster. > > For larger arrays this difference is even more noticeable: > > >>> t3 = > timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2] > ,m12[m2]]","from numarray import > arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") > >>> t4 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import > arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") > >>> t3.repeat(3,10) > [3.1818649768829346, 3.20477294921875, 3.190640926361084] > >>> t4.repeat(3,10) > [0.42328095436096191, 0.42140507698059082, 0.41979002952575684] > > as you see, now the difference is almost an order of magnitude (!). > > So, perhaps assuming the small memory overhead, in most of cases it is > better to use boolean selections. However, it would be nice to know the > ultimate reason of why this happens, because the Perry approach seems > intuitively faster. > Yes I agree. It was good of you to post these timings. I don't think we had actually compared the two approaches though the results don't surprise me (though I suspect the results may change if the first mask has a very small percentage of elements; the large timing test has nearly all elements selected for the first mask). Perry |