From: Charles R H. <cha...@gm...> - 2006-09-19 20:56:06
|
On 9/19/06, Tim Hochberg <tim...@ie...> wrote: > > A. M. Archibald wrote: > > On 19/09/06, Tim Hochberg <tim...@ie...> wrote: > > > >> Keith Goodman wrote: > >> > >>> In what order would you like argsort to sort the values -inf, nan, > inf? > >>> > >>> > >> Ideally, -inf should sort first, inf should sort last and nan should > >> raise an exception if present. > >> > >> -tim > >> > > > > Mmm. Somebody who's working with NaNs has more or less already decided > > they don't want to be pestered with exceptions for invalid data. > Do you really think so? In my experience NaNs are nearly always just an > indication of a mistake somewhere that didn't get trapped for one reason > or another. > > > I'd > > be happy if they wound up at either end, but I'm not sure it's worth > > hacking up the sort algorithm when a simple isnan() can pull them out. > > > Moving them to the end seems to be the worst choice to me. Leaving them > alone is fine with me. Or raising an exception would be fine. Or doing > one or the other depending on the error mode settings would be even > better if it is practical. > > > What's happening now is that NaN<a, NaN==a, and NaN>a are all false, > > which rather confuses the sorting algorithm. But as long as it doesn't > > actually *break* (that is, leave some of the non-NaNs incorrectly > > sorted) I don't care. > > > Is that true? Are all of numpy's sorting algorithms robust against > nontransitive objects laying around? The answer to that appears to be > no. Try running this a couple of times to see what I mean: > > n = 10 > for i in range(10): > for kind in 'quicksort', 'heapsort', 'mergesort': > a = rand(n) > b = a.copy() > a[n//2] = nan > a.sort(kind=kind) > b.sort(kind=kind) > assert alltrue(a[:n//2] == b[:n//2]), (kind, a, b) > > > The values don't correctly cross the inserted NaN and the sort is > incorrect. Except for heapsort those are doing insertion sorts because n is so small. Heapsort is trickier to understand because of the way the heap is formed and values pulled off. I'll look into that. I bet searchsorted gets messed up by nan's. Do you think it worthwhile to worry about it? Chuck |