Re: [Numpy-discussion] unique() should return a sorted array

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Tim Hochberg wrote:
> My first question is: why? What's the attraction in returning a sorted
> answer here? Returning an unsorted array is potentially faster,
> depending on the algorithm chosen,  and sorting after the fact is
> trivial. If one was going to spend extra complexity on something, I'd
> think it would be better spent on preserving the input order.

There is a unique function in matlab that returns a sorted vector. I think a
lot of people will expect a numpy and matlab functions with identical names
to behave similarly.

If we want to preserve the input order, we'd have to choose a convention
about whose value's order is retained: do we keep the order of the first
value found or the last one ?

Here is the benchmark. Sorry Norbert for not including your code the first
time, it turns out that with Alain's suggestion, its the fastest one both
for lists and arrays.

x = rand(100000)*100
x = x.astype('i')
l = list(x)

For array x:

In [166]: timeit unique_alan(x) # with set instead of dict
100 loops, best of 3: 8.8 ms per loop

In [167]: timeit unique_norbert(x)
100 loops, best of 3: 8.8 ms per loop

In [168]: timeit unique_sasha(x)
100 loops, best of 3: 10.8 ms per loop

In [169]: timeit unique(x)
10 loops, best of 3: 50.4 ms per loop

In [170]: timeit unique1d(x)
10 loops, best of 3: 13.2 ms per loop

For list l:

In [196]: timeit unique_norbert(l)
10 loops, best of 3: 29 ms per loop

In [197]: timeit unique_alan(l)  # with set instead of dict
10 loops, best of 3: 14.5 ms per loop

In [193]: timeit unique(l)
10 loops, best of 3: 29.6 ms per loop

Note:
In Norbert function, setting sort=False for flattenable objects returns a
sorted array anyway. So I'd suggest to remove the sort keyword, sort if the
datatype is sortable, and don't sort if its not.

David

Re: [Numpy-discussion] unique() should return a sorted array

A package for scientific computing with Python

Re: [Numpy-discussion] unique() should return a sorted array