From: Charles R H. <cha...@gm...> - 2006-09-19 22:55:59
|
On 9/19/06, Tim Hochberg <tim...@ie...> wrote: > > Travis Oliphant wrote: > > Tim Hochberg wrote: > > > > > >> A. M. Archibald wrote: > >> > >> > >> > >>> On 19/09/06, Tim Hochberg <tim...@ie...> wrote: > >>> > >>> > >>> > >>> > >>> > >>>> I'm not sure where the breakpoint is, but I was seeing failures for > all > >>>> three sort types with N as high as 10000. I suspect that they're all > >>>> broken in the presence of NaNs. I further suspect you'd need some > >>>> punishingly slow n**2 algorithm to be robust in the presence of NaNs. > >>>> > >>>> > >>>> > >>>> > >>> Not at all. Just temporarily make NaNs compare greater than any other > >>> floating-point value and use quicksort (say). You can even do this for > >>> python lists without much trouble. > >>> > >>> > >>> > >>> > >> I misspoke. What I meant here was keeping the behavior that people > think > >> that we already have but don't: NaNs stay in place and everything is > >> sorted around them. And even that's not true, since you could just > >> record where the NaNs are, remove them, sort and put them back. What I > >> was really getting at was, that I'm guessing, and it's just a guess, > >> that (a) none of the fast sorting algorithms do anything sensible > unless > >> special cased and (b) one could come up with a naive n**2 sort that > does > >> do something sensible without special casing (where sensible means > leave > >> the NaNs alone). > >> > >> > >> > >>> That's actually a viable suggestion for numpy's sorting, although it > >>> would be kind of ugly to implement: do a quick any(isnan(A)), and if > >>> not, use the fast stock sorting algorithms; if there is a NaN > >>> somewhere in the array, use a version of the sort that has a tweaked > >>> comparison function so the NaNs wind up at the end and are easy to > >>> trim off. > >>> > >>> But the current situation, silently returning arrays in which the > >>> non-NaNs are unsorted, is really bad. > >>> > >>> > >>> > >>> > >> If your going to do isnan anyway, why not just raise an exception. An > >> array with NaNs in it can't be sorted by any common sense definition of > >> sorting. Any treatment of NaNs is going to be arbitrary, so we might as > >> well make the user specify what they want. "In the face of ambiguity, > >> refuse the temptation to guess" and all that. > >> > >> My favorite solution would be to make sort respect the invalid mode of > >> seterr/geterr. However at the moment it doesn't seem to (in beta4 at > >> least) but neither does add or multiply so those probably need to be > >> looked at again.... > >> > >> > >> > > The geterr/seterr stuff changes how IEEE hardware flags are handled in > > ufuncs. Currently they are not even looked at elsewhere. Are you > > saying that add and multiply don't respect the invalid flag? If not, > > then this might be hardware related. Does the IEEE invalid hardware > > flag get raised on multiplication by nan or only on creation of nan? > It looks like I jumped to conclusions here. I was expecting that with > invalid set to 'raise' that an array that (+-*operations on nans would > raise an exception. It appears that is incorrect -- only operations that > create nans from non-nans will trigger this as you suspected. Similarly, > I expected that over='raise' would cause inf+something to raise an > exception. Again, not true; this raises an exception only when a new inf > is created from non infs. At least on my box. > > Interesting. And a little surprising. > > An interesting tidbit (in an array) inf/inf will raise invalid, but > nan/nan will not, it just returns nan. Fun, fun fun! > > > > All the seterr/geterr stuff relies on the hardware flags. We don't do > > any other checking. > > > > Yeah. And just by itself this isn't going to do anything for sort since > comparing nans will not set any flags, it's just that the result will be > problematic.If one wanted to use these flags for this one would have to > use/abuse the result of geterr to trigger an isnan test at the beginning > of sort and then warn, raise or call as appropriate. > > It would probably also not be unreasonable to punt and document sort as > failing in the presence of nans. For floats we could use something like: lessthan(a,b) := a < b || (a == nan && b != nan) Which would put all the nans at one end and might not add too much overhead. Chuck |