There's a rare bug in binary search algorithms having to do with integer overflow in the calculation of the midpoint of the search range. See http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
The offending code in PDL's vsearch is:
m = (jh+jl) >> 1;
where jh & jl are declared as having type PDL_Indx. At least on my 64bit linux box, PDL_Indx is declared as (signed) long. The fix (as per the above post) is to coerce them into unsigned longs, and then perform the shift, e.g.
m = ((unsigned PDL_Indx) jh + (unsigned PDL_Indx) jl) >> 1;
(which of course won't work).
It doesn't look like there's an unsigned version of PDL_Indx defined. To prevent future catastrophes, it'd be nice if that were done.
That blog post mentions that any divide-and-conquer function could possibly have the same bug. We should also look at qsort & related subs: they all have a line that looks like "median = (i+j)/2". There may be others.
I would categorize this as a "bug waiting to happen". :-)
My preference for a fix is the better alternative from the
blog post. It is clearer since it is explicit that the code
cannot overflow for non-negative values of low. The
version with (unsigned PDL_Indx) is less clear since the
behavior comes from mandated C language performance
in overflow.
mid = low + ((high - low) / 2);
We definitely could use code scrubbing to ensure this
isn't going to bite us somewhere else.
--Chris
Replacing the bit shift by an addition and a division operation results in a substantial performance hit. I've attached some test code. Here are the results of running it as
% time ./time 100 && time ./time 100 1
divide
real 0m15.859s
user 0m15.817s
sys 0m0.012s
shift
real 0m11.461s
user 0m11.433s
sys 0m0.004s
To be honest, without any comments, either version
mid = ((unsigned int)low + (unsigned int)high)) >> 1;
mid = low + ((high - low) / 2);
is unsatisfactory, and since both have to be documented, I'd choose the faster version over the slower one. Proper documentation about the effects of C unsigned integer overflow should alleviate any confusion about the code.
Fixed in git.
Thanks for the timing code. With -O3 I get essentially no difference in performance and there is no requirement to introduce an unsigned PDL_Indx datatype to "fix" the problem.
That said, improved types support for PDL3 should revisit this decision for development going forward.
Good catch on the optimization. Thanks.
Its in git for the future. Thanks for observing and reporting the problem.