#235 uniq fails when pdl contains nan

critical
closed-fixed
core (120)
3
2010-07-02
2010-06-04
No

perldl> $a=pdl(1,2,nan,1,2,nan,1,2,nan);
perldl> print $a
[1 2 nan 1 2 nan 1 2 nan]
perldl> print $a->uniq
[1 nan 2 nan 2 nan 2]

My expectation for the above would be:
[1 2 nan]

Discussion

  • Matthew McGillis

    Version info:
    perlDL shell v1.35
    ReadLines, NiceSlice, MultiLines enabled
    Reading PDL/default.perldlrc...
    Found docs database /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi/PDL/pdldoc.db
    Type 'help' for online help
    Type 'demo' for online demos
    Loaded PDL v2.4.4 (supports bad values)

    perl -v
    This is perl, v5.10.0 built for i386-linux-thread-multi

     
  • Derek Lamb

    Derek Lamb - 2010-06-04

    Confirmed in 2.4.6_004. The correct answer should probably be [1 2], since the uniq'ness of nan is probably undefined. As a workaround, since you have badvalues enabled, you can do
    perldl> print $a->setnantobad->uniq;
    [1 2]

    The issue is that qsort (used by uniq) does not put all the nans at the end of the array like it does for bad values. The ideal solution would not rely on badvalue support, nor would it rely on isfinite or badmask in PDL::Math, since PDL::Math is not loaded for PDL::Lite & PDL::LiteF.

     
  • Matthew McGillis

    A result of [1 2] would be better to me than what I'm getting although to a degree I'm not sure it is completely accurate since the range does have nan in it. I can understand the notion that mathematically each nan may be unique in how it was achieved. So I suppose any of the following may be desirable depending on your usage.

    [1 2]
    [1 2 nan]
    [1 2 nan nan nan]

    But the current result to me definitely needs work.

     
  • Chris Marshall

    Chris Marshall - 2010-06-04

    I'm not sure that something that is "not a number' can be in the range.
    uniq should still work correctly for the non-Nan values. Since, NaN's don't
    compare/sort, whatever fix there is needs to return [ 1 2 ]. A first cut might
    be to document that NaN's are ignored along with Bad values.

     
  • Chris Marshall

    Chris Marshall - 2010-06-04

    To detect NaN's just check for $a != $a. inf and -inf do work so just a NaN check should be enough.

     
  • Craig DeForest

    Craig DeForest - 2010-06-04

    Interesting. This is definitely a bug, but OTOH there *are* 2^15 or so NaNs defined by IEEE, so in principle the answer could be [1 2 nan nan nan]. But [1 2 nan] sounds like the right expectation.

     
  • Chris Marshall

    Chris Marshall - 2010-06-15

    I pushed a quick fix to uniq and uniqind to work (but ignoring the NaN values).

    Did some searches and determined that MATLAB apparently does the third option
    among Matthew's list [ 1 2 nan nan nan ].

    Given the various possibilities, I would tend to choose the MATLAB/Octave
    compatible behavior. It is easy enough to strip out the entries with NaN's
    after the uniq operation.

    In the meantime uniq and uniqind work by ignoring the NaN values.
    uniqvec just barfs if it gets NaN values---preventing silently wrong output errors.

    I've lowered the Priority some since the methods now work correctly (for
    some value of correctly).

     
  • Chris Marshall

    Chris Marshall - 2010-06-15
    • priority: 5 --> 3
    • assigned_to: nobody --> marshallch
     
  • Chris Marshall

    Chris Marshall - 2010-06-17

    Bug fixed in Git.
    Thanks for reporting the problem!

     
  • Chris Marshall

    Chris Marshall - 2010-06-17

    Per the replies/discussion on this ticket, I've implemented NaN and BAD value handling for uniq, uniqind, and uniqvec. I started with NaN but working through the new code showed a couple of issues in the BAD value handling as well. Here is a sequence of your example in the Perldl2 shell (a.k.a. pdl2) from the latest git PDL:

    PDL> sub nan { return pdl(0) / 0; }
    PDL> $a=pdl(1,2,nan,1,2,nan,1,2,nan);
    PDL> p $a
    [1 2 nan 1 2 nan 1 2 nan]
    PDL> p $a->uniq
    [1 2 nan nan nan]
    PDL> p $a->uniqind
    [0 4 2 5 8]

    NOTE: that the convention for reporting NaN results is that same as used by Matlab in its unique routine.

     
  • Chris Marshall

    Chris Marshall - 2010-06-17
    • status: open --> pending-fixed
     
  • SourceForge Robot

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
  • SourceForge Robot

    • status: pending-fixed --> closed-fixed
     

Log in to post a comment.