From: David H. <dav...@gm...> - 2006-07-03 17:00:38
|
Here is a quick benchmark between numpy's unique, unique1d and sasha's unique: x = rand(100000)*100 x = x.astype('i') %timeit unique(x) 10 loops, best of 3: 525 ms per loop %timeit unique_sasha(x) 100 loops, best of 3: 10.7 ms per loop timeit unique1d(x) 100 loops, best of 3: 12.6 ms per loop So I wonder what is the added value of unique? Could unique1d simply become unique ? Cheers, David P.S. I modified sasha's version to account for the case where all elements are identical, which returned an empty array. def unique_sasha(x): s = sort(x) r = empty(s.shape, float) r[:-1] = s[1:] r[-1] = NaN return s[r != s] 2006/7/3, Robert Cimrman <cim...@nt...>: > > Sasha wrote: > > On 7/2/06, Norbert Nemec <Nor...@gm...> wrote: > >> ... > >> Does anybody know about the internals of the python "set"? How is > >> .keys() implemented? I somehow have really doubts about the efficiency > >> of this method. > >> > > Set implementation (Objects/setobject.c) is a copy and paste job from > > dictobject with values removed. As a result it is heavily optimized > > for the case of string valued keys - a case that is almost irrelevant > > for numpy. > > > > I think something like the following (untested, 1d only) will probably > > be much faster and sorted: > > > > def unique(x): > > s = sort(x) > > r = empty_like(s) > > r[:-1] = s[1:] > > r[-1] = s[0] > > return s[r != s] > > There are 1d array set operations like this already in numpy > (numpy/lib/arraysetops.py - unique1d, ...) > > r. > > > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |