From: David H. <dav...@gm...> - 2006-07-11 18:27:38
|
Tim Hochberg wrote: > My first question is: why? What's the attraction in returning a sorted > answer here? Returning an unsorted array is potentially faster, > depending on the algorithm chosen, and sorting after the fact is > trivial. If one was going to spend extra complexity on something, I'd > think it would be better spent on preserving the input order. There is a unique function in matlab that returns a sorted vector. I think a lot of people will expect a numpy and matlab functions with identical names to behave similarly. If we want to preserve the input order, we'd have to choose a convention about whose value's order is retained: do we keep the order of the first value found or the last one ? Here is the benchmark. Sorry Norbert for not including your code the first time, it turns out that with Alain's suggestion, its the fastest one both for lists and arrays. x = rand(100000)*100 x = x.astype('i') l = list(x) For array x: In [166]: timeit unique_alan(x) # with set instead of dict 100 loops, best of 3: 8.8 ms per loop In [167]: timeit unique_norbert(x) 100 loops, best of 3: 8.8 ms per loop In [168]: timeit unique_sasha(x) 100 loops, best of 3: 10.8 ms per loop In [169]: timeit unique(x) 10 loops, best of 3: 50.4 ms per loop In [170]: timeit unique1d(x) 10 loops, best of 3: 13.2 ms per loop For list l: In [196]: timeit unique_norbert(l) 10 loops, best of 3: 29 ms per loop In [197]: timeit unique_alan(l) # with set instead of dict 10 loops, best of 3: 14.5 ms per loop In [193]: timeit unique(l) 10 loops, best of 3: 29.6 ms per loop Note: In Norbert function, setting sort=False for flattenable objects returns a sorted array anyway. So I'd suggest to remove the sort keyword, sort if the datatype is sortable, and don't sort if its not. David |