Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#359 Improved documentation for vsearch

feature_request
closed
Diab Jerius
None
9
2015-03-05
2014-08-18
Diab Jerius
No

The documentation for vsearch indicates it returns the index of the "least larger member". This doesn't seem to be standard nomenclature (cf. http://en.wikipedia.org/wiki/Least-upper-bound_property).

I can easily parse its meaning as "the smallest element of the set which is larger than the searched-for element". That's not what vsearch is returning, however. It returns the smallest element of the set which is larger or equal to the searched-for element.

pdl> $xs = sequence(10); say $xs->index( vsearch( pdl( 4.5, 8), $xs ) )
[5 8]

Note the "8", rather than "9".

So, while "least larger member" is very similar to "least upper bound", it isn't using the exact magic words, so I suggest clarifying to documentation to say

Returns for each value of $vals the index of the smallest member of $xs which is greater than or equal to it. $xs must be in increasing order.

While the term "least upper bound" may be well defined mathematically, it's not obvious from its words that the comparison includes equality. I think being explicit about what is happening would be much clearer, an approach backed by my random sample of one colleague.

Thanks!
Diab

Related

Bugs: #359

Discussion

  • Chris Marshall
    Chris Marshall
    2014-12-20

    • Priority: 5 --> 9
     
  • Chris Marshall
    Chris Marshall
    2014-12-20

    Nice to fix for PDL-2.008, upping the priority as a reminder.

     
  • Chris Marshall
    Chris Marshall
    2015-02-22

    The current pdldoc vsearch returns

    Module PDL::Primitive
    vsearch
    Signature: ( vals(); xs(n); [o] indx(); [\%options] )

    Efficiently search for values in a sorted piddle.
    
      $idx = vsearch( $vals, $x, [\%options] );
      vsearch( $vals, $x, $idx, [\%options ] );
    
    vsearch performs a binary search for the values from $vals piddle in the
    ordered piddle $x, returning indices into $x. It is a front end to a set
    of routines which differ in how matches are determined and the meaning
    of the returned indices.
    
    The "mode" option indicates which method of searching to use, and may be
    one of:
    
    "sample"
        invoke vsearch_sample, returning indices appropriate for sampling
        within a distribution.
    
    "insert_leftmost"
        invoke vsearch_insert_leftmost, returning the left-most possible
        insertion point.
    
    "insert_rightmost"
        invoke vsearch_insert_rightmost, returning the right-most possible
        insertion point.
    
    "insert_match"
        invoke vsearch_match, returning the index of a matching element,
        else -(insertion point + 1)
    
    "insert_bin_inclusive"
        invoke vsearch_bin_inclusive, returning an index appropriate for
        binning on a grid where the left bin edges are *inclusive* of the
        bin.
    
    "insert_bin_exclusive"
        invoke vsearch_bin_exclusive, returning an index appropriate for
        binning on a grid where the left bin edges are *exclusive* of the
        bin.
    
    The default value of "mode" is "sample".
    

    which I do not understand. Maybe this could be made more clear pre-2.008?

     
  • mohawk
    mohawk
    2015-02-23

    I have had a go at explaining a bit more, in https://sourceforge.net/p/pdl/code/merge-requests/30/

    In order to better understand the workings of vsearch, one should read a little further on for the docs of vsearch_bin_(in,ex)clusive. It made sense to me. If that isn't considered enough, Diab will need to liaise with Chris to clarify what is still unclear.

     
  • Diab Jerius
    Diab Jerius
    2015-02-23

    Sorry for the tardy reply. This bug report was submitted before I reworked the vsearch code and docs, so the current context is somewhat muddled. I'm happy with the current docs (as I wrote them).

    Chris, could you be more explicit about what's not clear about the vsearch docs? I used language which seemed consistent with what I found in the documentation for other implementations of vsearch (primarily the Java one).

     
    • Chris Marshall
      Chris Marshall
      2015-02-23

      If you do 'pdldoc vsearch' you get a description that is not
      particularly useful since all of the real explanation is in the
      vsearch_xxx routines and not in the vsearch POD.

      What would help:
      - put links in the POD for the specific vsearch_xxx
      - add a =for example to show what happens
      - this should make it clear what the 'sample' mode is
      - put 'see also for the mode-specific routines as well

      --Chris

      On Mon, Feb 23, 2015 at 10:09 AM, Diab Jerius djerius@users.sf.net wrote:

      Sorry for the tardy reply. This bug report was submitted before I reworked
      the vsearch code and docs, so the current context is somewhat muddled. I'm
      happy with the current docs (as I wrote them).

      Chris, could you be more explicit about what's not clear about the vsearch
      docs? I used language which seemed consistent with what I found in the
      documentation for other implementations of vsearch (primarily the Java one).


      Status: open
      Group: feature_request
      Created: Mon Aug 18, 2014 09:37 PM UTC by Diab Jerius
      Last Updated: Mon Feb 23, 2015 02:55 AM UTC
      Owner: nobody

      The documentation for vsearch indicates it returns the index of the "least
      larger member". This doesn't seem to be standard nomenclature (cf.
      http://en.wikipedia.org/wiki/Least-upper-bound_property).

      I can easily parse its meaning as "the smallest element of the set which
      is larger than the searched-for element". That's not what vsearch is
      returning, however. It returns the smallest element of the set which is
      larger or equal to the searched-for element.

      pdl> $xs = sequence(10); say $xs->index( vsearch( pdl( 4.5, 8), $xs ) )
      [5 8]

      Note the "8", rather than "9".

      So, while "least larger member" is very similar to "least upper bound",
      it isn't using the exact magic words, so I suggest clarifying to
      documentation to say

      Returns for each value of $vals the index of the smallest member of $xs
      which is greater than or equal to it. $xs must be in increasing order.

      While the term "least upper bound" may be well defined mathematically,
      it's not obvious from its words that the comparison includes equality. I
      think being explicit about what is happening would be much clearer, an
      approach backed by my random sample of one colleague.

      Thanks!
      Diab


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/pdl/bugs/359/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #359

      Attachments
  • Should probably be set to pending-fixed or fixed now that https://sourceforge.net/p/pdl/code/merge-requests/30/ is merged.

     
    • Chris Marshall
      Chris Marshall
      2015-02-25

      MR30 does not include the additional clarifications I mentioned. 'pdldoc
      vsearch' is not self-explanatory to me. The additions proposed would make
      it so.

      On Tue, Feb 24, 2015 at 9:57 PM, Zakariyya Mughal zsmughal@users.sf.net
      wrote:

      Should probably be set to pending-fixed or fixed now that
      https://sourceforge.net/p/pdl/code/merge-requests/30/ is merged.


      Status: open
      Group: feature_request
      Created: Mon Aug 18, 2014 09:37 PM UTC by Diab Jerius
      Last Updated: Mon Feb 23, 2015 03:09 PM UTC
      Owner: nobody

      The documentation for vsearch indicates it returns the index of the "least
      larger member". This doesn't seem to be standard nomenclature (cf.
      http://en.wikipedia.org/wiki/Least-upper-bound_property).

      I can easily parse its meaning as "the smallest element of the set which
      is larger than the searched-for element". That's not what vsearch is
      returning, however. It returns the smallest element of the set which is
      larger or equal to the searched-for element.

      pdl> $xs = sequence(10); say $xs->index( vsearch( pdl( 4.5, 8), $xs ) )
      [5 8]

      Note the "8", rather than "9".

      So, while "least larger member" is very similar to "least upper bound",
      it isn't using the exact magic words, so I suggest clarifying to
      documentation to say

      Returns for each value of $vals the index of the smallest member of $xs
      which is greater than or equal to it. $xs must be in increasing order.

      While the term "least upper bound" may be well defined mathematically,
      it's not obvious from its words that the comparison includes equality. I
      think being explicit about what is happening would be much clearer, an
      approach backed by my random sample of one colleague.

      Thanks!
      Diab


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/pdl/bugs/359/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #359

  • Chris Marshall
    Chris Marshall
    2015-03-05

    • status: open --> closed
    • assigned_to: Diab Jerius
     
    • On 2015-03-05 at 22:39:59 +0000, Chris Marshall wrote:

      • status: open --> closed
      • assigned_to: Diab Jerius
      • Comment:

      Looking at the docs again, I think the clarity on vsearch() really needs a tutorial rather than just a few line example. Thanks for the improved docs and implementation. Fixed in git and should appear in PDL-2.007_12 and later.

      I can write up a tutorial as well and I'll add it on to the
      doc/vsearch-example branch https://github.com/PDLPorters/pdl/pull/58.

      I'll update this issue when I do.

       
  • Chris Marshall
    Chris Marshall
    2015-03-05

    Looking at the docs again, I think the clarity on vsearch() really needs a tutorial rather than just a few line example. Thanks for the improved docs and implementation. Fixed in git and should appear in PDL-2.007_12 and later.