From: Lutz Euler <leuler@us...>  20130429 20:40:05

The branch "master" has been updated in SBCL: via 62f92bd02e9c04a46893ff9e7b88acdaeab230fa (commit) from 423b1f8cba83d16e57e852a51cf5d51ef709b2ed (commit)  Log  commit 62f92bd02e9c04a46893ff9e7b88acdaeab230fa Author: Lutz Euler <lutz.euler@...> Date: Mon Apr 29 22:35:01 2013 +0200 Improve scaling of type derivation for LOG{AND,IOR,XOR}. If the types of the arguments of LOG{AND,IOR,XOR} are known to be ranges of nonnegative integers the compiler currently derives the range of the result using straightforward implementations of algorithms from "Hacker's Delight". These take quadratical time in the number of bits of the inputs in the worst case, potentially leading to unacceptably long compilation times. (The algorithms are based on loops over the bits of the inputs, doing calculations during each iteration that are themselves linear in the number of bits of their operands.) Instead implement bitparallel algorithms I have found that take linear time in all cases. While their runtime therefore is limited to much smaller values for large inputs, it is comparable to that of the current algorithms for small inputs, too; the new deriver for LOGXOR is in fact faster than the old one by a factor of two to ten already in the latter case. The (existing) test for these derivers compares their results with those from a bruteforce algorithm for all O(N^4) many pairs of input ranges with endpoints from the set of Nbit unsigned integers. The bruteforce algorithm needs to consider O(N^2) input pairs for each pair of ranges, making the total runtime O(N^6). Therefore the test normally runs with N = 5. I have tested all three new derivers successfully with N = 7. Replace LOG{AND,IOR,XOR}DERIVEUNSIGNED{LOW,HIGH}BOUND with LOG{AND,IOR,XOR}DERIVEUNSIGNEDBOUNDS to make it possible to evaluate expressions only once that the calculations for the low and the high bound have in common. The callers always need both bounds anyway. Adapt the test to this change. (It runs twice as fast now due to the brute force loop calculating both bounds in one go.) Add a test for the scaling behaviour. This needs a function to measure runtimes over potentially large ranges; add this to testutil.lisp. Fixes lp#1096444.  CREDITS  1 + NEWS  3 + src/compiler/bitopsderivetype.lisp  272 +++++++++++++++++++ tests/testutil.lisp  36 +++++ tests/type.pure.lisp  70 ++++++ 5 files changed, 239 insertions(+), 143 deletions() diff git a/CREDITS b/CREDITS index dc40fa0..99a1b12 100644  a/CREDITS +++ b/CREDITS @@ 848,6 +848,7 @@ DTC Douglas Crosher JES Juho Snellman JRXR Joshua Ross LAV Larry Valkama +LEU Lutz Euler MG Gabor Melis MNA Martin Atzmueller NJF Nathan Froyd diff git a/NEWS b/NEWS index 4b3f503..053e3a2 100644  a/NEWS +++ b/NEWS @@ 3,6 +3,9 @@ changes relative to sbcl1.1.7: * enhancement: RUNPROGRAM supports a :DIRECTORY argument to set the working directory of the spawned process. (lp#791800) (patch by Matthias Benkard) + * bug fix: type derivation for LOG{AND,IOR,XOR} scales linearly instead + of quadratically with the size of the input in the worst case. + (lp#1096444) * bug fix: handle errors when initializing *defaultpathnamedefaults*, sbext:*runtimepathname*, sbext:*posixargv* on startup, like character decoding errors, or directories being deleted. diff git a/src/compiler/bitopsderivetype.lisp b/src/compiler/bitopsderivetype.lisp index 32d7ae0..7cbfb9b 100644  a/src/compiler/bitopsderivetype.lisp +++ b/src/compiler/bitopsderivetype.lisp @@ 25,48 +25,115 @@ (or (null min) (minusp min)))) (values nil t t))) ;;; See _Hacker's Delight_, Henry S. Warren, Jr. pp 5863 for an ;;; explanation of LOG{AND,IOR,XOR}DERIVEUNSIGNED{LOW,HIGH}BOUND. ;;; Credit also goes to Raymond Toy for writing (and debugging!) similar ;;; versions in CMUCL, from which these functions copy liberally. +;;;; Generators for simple bit masks (defun logandderiveunsignedlowbound (x y)  (let ((a (numerictypelow x))  (b (numerictypehigh x))  (c (numerictypelow y))  (d (numerictypehigh y)))  (loop for m = (ash 1 (integerlength (lognor a c))) then (ash m 1)  until (zerop m) do  (unless (zerop (logand m (lognot a) (lognot c)))  (let ((temp (logandc2 (logior a m) (1 m))))  (when (<= temp b)  (setf a temp)  (loopfinish))  (setf temp (logandc2 (logior c m) (1 m)))  (when (<= temp d)  (setf c temp)  (loopfinish))))  finally (return (logand a c))))) +;;; Return an integer consisting of zeroes in its N least significant +;;; bit positions and ones in all others. If N is negative, return 1. +(declaim (inline zeroes)) +(defun zeroes (n) + (ash 1 n)) (defun logandderiveunsignedhighbound (x y)  (let ((a (numerictypelow x))  (b (numerictypehigh x))  (c (numerictypelow y))  (d (numerictypehigh y)))  (loop for m = (ash 1 (integerlength (logxor b d))) then (ash m 1)  until (zerop m) do  (cond  ((not (zerop (logand b (lognot d) m)))  (let ((temp (logior (logandc2 b m) (1 m))))  (when (>= temp a)  (setf b temp)  (loopfinish))))  ((not (zerop (logand (lognot b) d m)))  (let ((temp (logior (logandc2 d m) (1 m))))  (when (>= temp c)  (setf d temp)  (loopfinish)))))  finally (return (logand b d))))) +;;; Return an integer consisting of ones in its N least significant +;;; bit positions and zeroes in all others. If N is negative, return 0. +(declaim (inline ones)) +(defun ones (n) + (lognot (ash 1 n))) + +;;; The functions LOG{AND,IOR,XOR}DERIVEUNSIGNEDBOUNDS below use +;;; algorithms derived from those in the chapter "Propagating Bounds +;;; through Logical Operations" from _Hacker's Delight_, Henry S. +;;; Warren, Jr., 2nd ed., pp 8790. +;;; +;;; We used to implement the algorithms from that source (then its first +;;; edition) very faithfully here which exposed a weakness of theirs, +;;; namely worst case quadratical runtime in the number of bits of the +;;; input values, potentially leading to excessive compilation times for +;;; expressions involving bignums. To avoid that, I have devised and +;;; implemented variations of these algorithms that achieve linear +;;; runtime in all cases. +;;; +;;; Like Warren, let's start with the high bound on LOGIOR to explain +;;; how this is done. To follow, please read Warren's explanations on +;;; his "maxOR" function and compare this with how the second return +;;; value of LOGIORDERIVEUNSIGNEDBOUNDS below is calculated. +;;; +;;; "maxOR" loops starting from the left until it finds a position where +;;; both B and D are 1 and where it is possible to decrease one of these +;;; bounds by setting this bit in it to 0 and all following ones to 1 +;;; without the resulting value getting below the corresponding lower +;;; bound (A or C). This is done by calculating the modified values +;;; during each iteration where both B and D are 1 and comparing them +;;; against the lower bounds. +;;; The trick to avoid the loop is to exchange the order of the steps: +;;; First determine from which position rightwards it would be allowed +;;; to change B or D in this way and have the result be larger or equal +;;; than A or C respectively and then find the leftmost position equal +;;; to this or to the right of it where both B and D are 1. +;;; It is quite simple to find from where rightwards B could be modified +;;; this way: This is the leftmost position where B has a 1 and A a 0, +;;; or, cheaper to calculate, the leftmost position where A and B +;;; differ. Thus (INTEGERLENGTH (LOGXOR A B)) gives us this position +;;; where a result of 1 corresponds to the rightmost bit position. As we +;;; don't care which of B or D we modify we can take the maximum of this +;;; value and of (INTEGERLENGTH (LOGXOR C D)). +;;; The rest is equally simple: Build a mask of 1 bits from the thusly +;;; found position rightwards, LOGAND it with B and D and feed that into +;;; INTEGERLENGTH. From this build another mask and LOGIOR it with B +;;; and D to set the desired bits. +;;; The special cases where A equals B and/or C equals D are covered by +;;; the same code provided the mask generator treats an argument of 1 +;;; the same as 0, which both ZEROES and ONES do. +;;; +;;; To calculate the low bound on LOGIOR we need to treat X and Y +;;; independently for longer but the basic idea stays the same. +;;; +;;; LOGANDDERIVEUNSIGNEDBOUNDS can be derived by sufficiently many +;;; applications of DeMorgan's law from LOGIORDERIVEUNSIGNEDBOUNDS. +;;; The implementation additionally avoids work (that is, calculations +;;; of one's complements) by using the identity (INTEGERLENGTH X) = +;;; (INTEGERLENGTH (LOGNOT X)) and observing that ZEROES is cheaper +;;; than ONES. +;;; +;;; For the low bound on LOGXOR we use Warren's formula +;;; minXOR(a, b, c, d) = minAND(a, b, !d, !c)  minAND(!b, !a, c, d) +;;; where "!" is bitwise negation and "" is bitwise or. Both minANDs +;;; are implemented as in LOGANDDERIVEUNSIGNEDBOUNDS (the part for +;;; the first result), sharing the first LOGXOR and INTEGERLENGTH +;;; calculations as (LOGXOR A B) = (LOGXOR (LOGNOT B) (LOGNOT A)). +;;; +;;; For the high bound on LOGXOR Warren's formula seems unnecessarily +;;; complex. Instead, with (LOGNOT (LOGXOR X Y)) = (LOGXOR X (LOGNOT Y)) +;;; we have +;;; maxXOR(a, b, c, d) = !minXOR(a, b, !d, !c) +;;; and rewriting minXOR as above yields +;;; maxXOR(a, b, c, d) = !(minAND(a, b, c, d)  minAND(!b, !a, !d, !c)) +;;; This again shares the first LOGXOR and INTEGERLENGTH calculations +;;; between both minANDs and with the ones for the low bound. +;;; +;;; LEU, 20130429. + +(defun logandderiveunsignedbounds (x y) + (let* ((a (numerictypelow x)) + (b (numerictypehigh x)) + (c (numerictypelow y)) + (d (numerictypehigh y)) + (lengthxorx (integerlength (logxor a b))) + (lengthxory (integerlength (logxor c d)))) + (values + (let* ((mask (zeroes (max lengthxorx lengthxory))) + (index (integerlength (logior mask a c)))) + (logand a c (zeroes (1 index)))) + (let* ((maskx (ones lengthxorx)) + (masky (ones lengthxory)) + (indexx (integerlength (logand maskx b (lognot d)))) + (indexy (integerlength (logand masky d (lognot b))))) + (cond ((= indexx indexy) + ;; Both indexes are 0 here. + (logand b d)) + ((> indexx indexy) + (logand (logior b (ones (1 indexx))) d)) + (t + (logand (logior d (ones (1 indexy))) b))))))) (defun logandderivetypeaux (x y &optional sameleaf) (when sameleaf @@ 86,8 +153,8 @@ ((null ylen) (specifiertype `(unsignedbyte* ,xlen))) (t  (let ((low (logandderiveunsignedlowbound x y))  (high (logandderiveunsignedhighbound x y))) + (multiplevaluebind (low high) + (logandderiveunsignedbounds x y) (specifiertype `(integer ,low ,high))))) ;; X is positive, but Y might be negative. (cond ((null xlen) @@ 107,43 +174,28 @@ ;; We can't tell squat about the result. (specifiertype 'integer))))))) (defun logiorderiveunsignedlowbound (x y)  (let ((a (numerictypelow x))  (b (numerictypehigh x))  (c (numerictypelow y))  (d (numerictypehigh y)))  (loop for m = (ash 1 (integerlength (logxor a c))) then (ash m 1)  until (zerop m) do  (cond  ((not (zerop (logandc2 (logand c m) a)))  (let ((temp (logand (logior a m) (1+ (lognot m)))))  (when (<= temp b)  (setf a temp)  (loopfinish))))  ((not (zerop (logandc2 (logand a m) c)))  (let ((temp (logand (logior c m) (1+ (lognot m)))))  (when (<= temp d)  (setf c temp)  (loopfinish)))))  finally (return (logior a c)))))  (defun logiorderiveunsignedhighbound (x y)  (let ((a (numerictypelow x))  (b (numerictypehigh x))  (c (numerictypelow y))  (d (numerictypehigh y)))  (loop for m = (ash 1 (integerlength (logand b d))) then (ash m 1)  until (zerop m) do  (unless (zerop (logand b d m))  (let ((temp (logior ( b m) (1 m))))  (when (>= temp a)  (setf b temp)  (loopfinish))  (setf temp (logior ( d m) (1 m)))  (when (>= temp c)  (setf d temp)  (loopfinish))))  finally (return (logior b d))))) +(defun logiorderiveunsignedbounds (x y) + (let* ((a (numerictypelow x)) + (b (numerictypehigh x)) + (c (numerictypelow y)) + (d (numerictypehigh y)) + (lengthxorx (integerlength (logxor a b))) + (lengthxory (integerlength (logxor c d)))) + (values + (let* ((maskx (ones lengthxorx)) + (masky (ones lengthxory)) + (indexx (integerlength (logand maskx (lognot a) c))) + (indexy (integerlength (logand masky (lognot c) a)))) + (cond ((= indexx indexy) + ;; Both indexes are 0 here. + (logior a c)) + ((> indexx indexy) + (logior (logand a (zeroes (1 indexx))) c)) + (t + (logior (logand c (zeroes (1 indexy))) a)))) + (let* ((mask (ones (max lengthxorx lengthxory))) + (index (integerlength (logand mask b d)))) + (logior b d (ones (1 index))))))) (defun logiorderivetypeaux (x y &optional sameleaf) (when sameleaf @@ 154,8 +206,8 @@ ((and (not xneg) (not yneg)) ;; Both are positive. (if (and xlen ylen)  (let ((low (logiorderiveunsignedlowbound x y))  (high (logiorderiveunsignedhighbound x y))) + (multiplevaluebind (low high) + (logiorderiveunsignedbounds x y) (specifiertype `(integer ,low ,high))) (specifiertype `(unsignedbyte* *)))) ((not xpos) @@ 187,41 +239,25 @@ ;; Unbounded. (specifiertype 'integer)))))))) (defun logxorderiveunsignedlowbound (x y)  (let ((a (numerictypelow x))  (b (numerictypehigh x))  (c (numerictypelow y))  (d (numerictypehigh y)))  (loop for m = (ash 1 (integerlength (logxor a c))) then (ash m 1)  until (zerop m) do  (cond  ((not (zerop (logandc2 (logand c m) a)))  (let ((temp (logand (logior a m)  (1+ (lognot m)))))  (when (<= temp b)  (setf a temp))))  ((not (zerop (logandc2 (logand a m) c)))  (let ((temp (logand (logior c m)  (1+ (lognot m)))))  (when (<= temp d)  (setf c temp)))))  finally (return (logxor a c)))))  (defun logxorderiveunsignedhighbound (x y)  (let ((a (numerictypelow x))  (b (numerictypehigh x))  (c (numerictypelow y))  (d (numerictypehigh y)))  (loop for m = (ash 1 (integerlength (logand b d))) then (ash m 1)  until (zerop m) do  (unless (zerop (logand b d m))  (let ((temp (logior ( b m) (1 m))))  (cond  ((>= temp a) (setf b temp))  (t (let ((temp (logior ( d m) (1 m))))  (when (>= temp c)  (setf d temp)))))))  finally (return (logxor b d))))) +(defun logxorderiveunsignedbounds (x y) + (let* ((a (numerictypelow x)) + (b (numerictypehigh x)) + (c (numerictypelow y)) + (d (numerictypehigh y)) + (notb (lognot b)) + (notd (lognot d)) + (lengthxorx (integerlength (logxor a b))) + (lengthxory (integerlength (logxor c d))) + (mask (zeroes (max lengthxorx lengthxory)))) + (values + (let ((indexad (integerlength (logior mask a notd))) + (indexbc (integerlength (logior mask notb c)))) + (logior (logand a notd (zeroes (1 indexad))) + (logand notb c (zeroes (1 indexbc))))) + (let ((indexac (integerlength (logior mask a c))) + (indexbd (integerlength (logior mask notb notd)))) + (lognor (logand a c (zeroes (1 indexac))) + (logand notb notd (zeroes (1 indexbd)))))))) (defun logxorderivetypeaux (x y &optional sameleaf) (when sameleaf @@ 232,8 +268,8 @@ ((and (not xneg) (not yneg)) ;; Both are positive (if (and xlen ylen)  (let ((low (logxorderiveunsignedlowbound x y))  (high (logxorderiveunsignedhighbound x y))) + (multiplevaluebind (low high) + (logxorderiveunsignedbounds x y) (specifiertype `(integer ,low ,high))) (specifiertype '(unsignedbyte* *)))) ((and (not xpos) (not ypos)) diff git a/tests/testutil.lisp b/tests/testutil.lisp index d6246bf..c4e4804 100644  a/tests/testutil.lisp +++ b/tests/testutil.lisp @@ 3,7 +3,8 @@ (:export #:withtest #:reportteststatus #:*failures* #:reallyinvokedebugger #:*breakonfailure* #:*breakonexpectedfailure*  #:makekillthread #:makejointhread)) + #:makekillthread #:makejointhread + #:runtime)) (inpackage :testutil) @@ 133,3 +134,36 @@ (cons (format nil "SBCL_MACHINE_TYPE=~A" (machinetype)) (cons (format nil "SBCL_SOFTWARE_TYPE=~A" (softwaretype)) (posixenviron)))) + +;;; Repeat calling THUNK until its cumulated runtime, measured using +;;; GETINTERNALRUNTIME, is larger than PRECISION. Repeat this +;;; REPETITIONS many times and return the time one call to THUNK took +;;; in seconds as a float, according to the minimum of the cumulated +;;; runtimes over the repetitions. +;;; This allows to easily measure the runtime of expressions that take +;;; much less time than one internal time unit. Also, the results are +;;; unaffected, modulo quantization effects, by changes to +;;; INTERNALTIMEUNITSPERSECOND. +;;; Taking the minimum is intended to reduce the error introduced by +;;; garbage collections occurring at unpredictable times. The inner +;;; loop doubles the number of calls to THUNK each time before again +;;; measuring the time spent, so that the time measurement overhead +;;; doesn't distort the result if calling THUNK takes very little time. +(defun runtime* (thunk repetitions precision) + (loop repeat repetitions + minimize + (loop with start = (getinternalruntime) + with duration = 0 + for n = 1 then (* n 2) + for totalruns = n then (+ totalruns n) + do (dotimes (i n) + (funcall thunk)) + (setf duration ( (getinternalruntime) start)) + when (> duration precision) + return (/ (float duration) (float totalruns))) + into mininternaltimeunitspercall + finally (return (/ mininternaltimeunitspercall + (float internaltimeunitspersecond))))) + +(defmacro runtime (form &key (repetitions 3) (precision 10)) + `(runtime* (lambda () ,form) ,repetitions ,precision)) diff git a/tests/type.pure.lisp b/tests/type.pure.lisp index 919d705..fff2114 100644  a/tests/type.pure.lisp +++ b/tests/type.pure.lisp @@ 241,36 +241,58 @@ ;;; (In fact, this is such a fearsome loop that executing it with the ;;; evaluator would take ages... Disable it under those circumstances.) #+#.(cl:if (cl:eq sbext:*evaluatormode* :compile) '(and) '(or)) (let* ((bits 5)  (size (ash 1 bits)))  (flet ((bruteforce (a b c d op minimize)  (loop with extreme = (if minimize (ash 1 bits) 0)  with collector = (if minimize #'min #'max)  for i from a upto b do  (loop for j from c upto d do  (setf extreme (funcall collector  extreme  (funcall op i j))))  finally (return extreme))))  (dolist (op '(logand logior logxor))  (dolist (minimize '(t nil))  (let ((deriver (intern (format nil "~ADERIVEUNSIGNED~:[HIGH~;LOW~]BOUND"  op minimize) +(withtest (:name (:typederivation :logicaloperations :correctness)) + (let* ((nbits 5) + (size (ash 1 nbits))) + (labels ((bruteforce (a b c d op) + (loop with min = (ash 1 nbits) + with max = 0 + for i from a upto b do + (loop for j from c upto d do + (let ((x (funcall op i j))) + (setf min (min min x) + max (max max x)))) + finally (return (values min max)))) + (test (a b c d op deriver) + (multiplevaluebind (brutelow brutehigh) + (bruteforce a b c d op) + (multiplevaluebind (testlow testhigh) + (funcall deriver + (sbc::specifiertype `(integer ,a ,b)) + (sbc::specifiertype `(integer ,c ,d))) + (unless (and (= brutelow testlow) + (= brutehigh testhigh)) + (format t "FAIL: ~A [~D, ~D] [~D, ~D]~%EXPECTED [~D, ~D] GOT [~D, ~D]~%" + op a b c d + brutelow brutehigh testlow testhigh) + (assert (and (= brutelow testlow) + (= brutehigh testhigh)))))))) + (dolist (op '(logand logior logxor)) + (let ((deriver (intern (format nil "~ADERIVEUNSIGNEDBOUNDS" op) (findpackage :sbc)))) (format t "testing type derivation: ~A~%" deriver) (loop for a from 0 below size do (loop for b from a below size do (loop for c from 0 below size do (loop for d from c below size do  (let* ((brute (bruteforce a b c d op minimize))  (xtype (sbc::specifiertype `(integer ,a ,b)))  (ytype (sbc::specifiertype `(integer ,c ,d)))  (derived (funcall deriver xtype ytype)))  (unless (= brute derived)  (format t "FAIL: ~A [~D,~D] [~D,~D] ~A~% ACTUAL ~D DERIVED ~D~%"  op a b c d minimize brute derived)  (assert (= brute derived))))))))))))) + (test a b c d op deriver)))))))))) + +(withtest (:name (:typederivation :logicaloperations :scaling)) + (let ((typex1 (sbc::specifiertype `(integer ,(expt 2 10000) + ,(expt 2 10000)))) + (typex2 (sbc::specifiertype `(integer ,(expt 2 100000) + ,(expt 2 100000)))) + (typey (sbc::specifiertype '(integer 0 1)))) + (dolist (op '(logand logior logxor)) + (let* ((deriver (intern (format nil "~ADERIVETYPEAUX" op) + (findpackage :sbc))) + (scale (/ (runtime (funcall deriver typex2 typey)) + (runtime (funcall deriver typex1 typey))))) + ;; Linear scaling is good, quadratical bad. Draw the line + ;; near the geometric mean of the corresponding SCALEs. + (when (> scale 32) + (error "Bad scaling of ~a: input 10 times but runtime ~a times as large." + deriver scale)))))) ;;; subtypep on CONS types wasn't taking account of the fact that a ;;; CONS type could be the empty type (but no other nonCONS type) in  hooks/postreceive  SBCL 