Perl Data Language / Bugs / #360 subtle & rare bug in vsearch

#360 subtle & rare bug in vsearch

Milestone: critical

Status: closed-accepted

Owner: Chris Marshall

Labels: None

Priority: 1

Updated: 2014-12-20

Created: 2014-08-19

Creator: Diab Jerius

Private: No

There's a rare bug in binary search algorithms having to do with integer overflow in the calculation of the midpoint of the search range. See http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

The offending code in PDL's vsearch is:

m = (jh+jl) >> 1;

where jh & jl are declared as having type PDL_Indx. At least on my 64bit linux box, PDL_Indx is declared as (signed) long. The fix (as per the above post) is to coerce them into unsigned longs, and then perform the shift, e.g.

m = ((unsigned PDL_Indx) jh + (unsigned PDL_Indx) jl) >> 1;

(which of course won't work).

It doesn't look like there's an unsigned version of PDL_Indx defined. To prevent future catastrophes, it'd be nice if that were done.

Discussion

Derek Lamb - 2014-08-19

That blog post mentions that any divide-and-conquer function could possibly have the same bug. We should also look at qsort & related subs: they all have a line that looks like "median = (i+j)/2". There may be others.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chris Marshall - 2014-08-23
  
  I would categorize this as a "bug waiting to happen". :-)
  
  My preference for a fix is the better alternative from the
  blog post. It is clearer since it is explicit that the code
  cannot overflow for non-negative values of low. The
  version with (unsigned PDL_Indx) is less clear since the
  behavior comes from mandated C language performance
  in overflow.
  
  mid = low + ((high - low) / 2);
  
  We definitely could use code scrubbing to ensure this
  isn't going to bite us somewhere else.
  
  --Chris
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Diab Jerius - 2014-08-25

Replacing the bit shift by an addition and a division operation results in a substantial performance hit. I've attached some test code. Here are the results of running it as

% time ./time 100 && time ./time 100 1
divide

real 0m15.859s
user 0m15.817s
sys 0m0.012s
shift

real 0m11.461s
user 0m11.433s
sys 0m0.004s

To be honest, without any comments, either version

mid = ((unsigned int)low + (unsigned int)high)) >> 1;

mid = low + ((high - low) / 2);

is unsatisfactory, and since both have to be documented, I'd choose the faster version over the slower one. Proper documentation about the effects of C unsigned integer overflow should alleviate any confusion about the code.

time.c

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Chris Marshall - 2014-08-25

Fixed in git.

Thanks for the timing code. With -O3 I get essentially no difference in performance and there is no requirement to introduce an unsigned PDL_Indx datatype to "fix" the problem.

That said, improved types support for PDL3 should revisit this decision for development going forward.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Diab Jerius - 2014-08-25
  
  Good catch on the optimization. Thanks.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Chris Marshall - 2014-08-25

status: open --> open-fixed

assigned_to: Chris Marshall

Priority: 5 --> 1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Chris Marshall - 2014-12-20

status: open-fixed --> closed-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Chris Marshall - 2014-12-20

Its in git for the future. Thanks for observing and reporting the problem.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.