From: Charles Z. <ka...@be...> - 2018-06-23 01:50:28
|
Hello clisp, I wrote a loop hoisting pass, which means that the compiler can now hoist loop invariant computations out. For example, in (cleavir-compile '(lambda (x y) (dotimes (i x) (print (+ x y))))) the addition, which is loop invariant, is hoisted out of the loop and only computed once. Disassembly of function NIL (CONST 0) = 0 (CONST 1) = NIL 2 required arguments 0 optional arguments No rest parameter No keyword parameters 18 byte-code instructions: 0 (LOAD&PUSH 2) 1 (LOAD&PUSH 2) 2 (CALLSR&PUSH 2 55) ; + 5 (CONST&PUSH 0) ; 0 6 L6 6 (LOAD&PUSH 0) 7 (LOAD&PUSH 5) 8 (CALLSR&PUSH 1 52) ; >= 11 (LOAD&JMPIF 0 L25) 14 (LOAD&PUSH 2) 15 (PUSH-UNBOUND 1) 17 (CALLS1 142) ; PRINT 19 (LOAD&INC&STORE 1) 21 (SKIP 1) 23 (JMP L6) 25 L25 25 (CONST 1) ; NIL 26 (SKIP&RET 6) vs what CLISP produces: Disassembly of function NIL (CONST 0) = 0 2 required arguments 0 optional arguments No rest parameter No keyword parameters 15 byte-code instructions: 0 (CONST&PUSH 0) ; 0 1 (JMP L14) 3 L3 3 (LOAD&PUSH 3) 4 (LOAD&PUSH 3) 5 (CALLSR&PUSH 2 55) ; + 8 (PUSH-UNBOUND 1) 10 (CALLS1 142) ; PRINT 12 (LOAD&INC&STORE 0) 14 L14 14 (LOAD&PUSH 0) 15 (LOAD&PUSH 4) 16 (CALLSR&JMPIFNOT 1 52 L3) ; >= 20 (NIL) 21 (SKIP&RET 4) which computes the addition every loop. This optimization is especially useful in heavily numeric code which indexes arrays in multinested loops. For example, in the function (lambda (array) (dotimes (i 5) (dotimes (j 7) (print (aref array (+ i 2) (+ j 4)))))) the computation (+ i 2) can be hoisted into the outer loop, since the value doesn't change in the inner loop. So Cleavir produces: Disassembly of function NIL (CONST 0) = 0 (CONST 1) = 5 (CONST 2) = 2 (CONST 3) = 0 (CONST 4) = 7 (CONST 5) = 4 (CONST 6) = NIL 1 required argument 0 optional arguments No rest parameter No keyword parameters 35 byte-code instructions: 0 (CONST&PUSH 0) ; 0 1 L1 1 (LOAD&PUSH 0) 2 (CONST&PUSH 1) ; 5 3 (CALLSR&PUSH 1 52) ; >= 6 (LOAD&JMPIF 0 L51) 9 (LOAD&PUSH 1) 10 (CONST&PUSH 2) ; 2 11 (CALLSR&PUSH 2 55) ; + 14 (CONST&PUSH 3) ; 0 15 L15 15 (LOAD&PUSH 0) 16 (CONST&PUSH 4) ; 7 17 (CALLSR&PUSH 1 52) ; >= 20 (LOAD&JMPIF 0 L45) 23 (LOAD&PUSH 1) 24 (CONST&PUSH 5) ; 4 25 (CALLSR&PUSH 2 55) ; + 28 (LOAD&PUSH 7) 29 (LOAD&PUSH 4) 30 (LOAD&PUSH 2) 31 (CALLSR&PUSH 2 1) ; AREF 34 (LOAD&PUSH 0) 35 (PUSH-UNBOUND 1) 37 (CALLS1 142) ; PRINT 39 (LOAD&INC&STORE 3) 41 (SKIP 3) 43 (JMP L15) 45 L45 45 (LOAD&INC&STORE 4) 47 (SKIP 4) 49 (JMP L1) 51 L51 51 (CONST 6) ; NIL 52 (SKIP&RET 4) whereas CLISP must calculate (+i 2) every time and produces: Disassembly of function NIL (CONST 0) = 0 (CONST 1) = 5 (CONST 2) = 7 (CONST 3) = 2 (CONST 4) = 4 1 required argument 0 optional arguments No rest parameter No keyword parameters 29 byte-code instructions: 0 (CONST&PUSH 0) ; 0 1 (JMP L36) 3 L3 3 (CONST&PUSH 0) ; 0 4 (JMP L26) 6 L6 6 (LOAD&PUSH 3) 7 (CONST&PUSH 3) ; 2 8 (LOAD&PUSH 3) 9 (CALLSR&PUSH 2 55) ; + 12 (CONST&PUSH 4) ; 4 13 (LOAD&PUSH 3) 14 (CALLSR&PUSH 2 55) ; + 17 (CALLSR&PUSH 2 1) ; AREF 20 (PUSH-UNBOUND 1) 22 (CALLS1 142) ; PRINT 24 (LOAD&INC&STORE 0) 26 L26 26 (LOAD&PUSH 0) 27 (CONST&PUSH 2) ; 7 28 (CALLSR&JMPIFNOT 1 52 L6) ; >= 32 (SKIP 1) 34 (LOAD&INC&STORE 0) 36 L36 36 (LOAD&PUSH 0) 37 (CONST&PUSH 1) ; 5 38 (CALLSR&JMPIFNOT 1 52 L3) ; >= 42 (NIL) 43 (SKIP&RET 3) As you can see, this transformation is a big improvement for nested loops, since most time is spent in inner loops. Charles |