gsoc: Loop invariant hoisting

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello clisp,

I wrote a loop hoisting pass, which means that the compiler can now
hoist loop invariant computations out.

For example, in
(cleavir-compile '(lambda (x y) (dotimes (i x)
                                           (print (+ x y)))))
the addition, which is loop invariant, is hoisted out of the loop and
only computed once.

Disassembly of function NIL
(CONST 0) = 0
(CONST 1) = NIL
2 required arguments
0 optional arguments
No rest parameter
No keyword parameters
18 byte-code instructions:
0     (LOAD&PUSH 2)
1     (LOAD&PUSH 2)
2     (CALLSR&PUSH 2 55)                  ; +
5     (CONST&PUSH 0)                      ; 0
6     L6
6     (LOAD&PUSH 0)
7     (LOAD&PUSH 5)
8     (CALLSR&PUSH 1 52)                  ; >=
11    (LOAD&JMPIF 0 L25)
14    (LOAD&PUSH 2)
15    (PUSH-UNBOUND 1)
17    (CALLS1 142)                        ; PRINT
19    (LOAD&INC&STORE 1)
21    (SKIP 1)
23    (JMP L6)
25    L25
25    (CONST 1)                           ; NIL
26    (SKIP&RET 6)

vs what CLISP produces:

Disassembly of function NIL
(CONST 0) = 0
2 required arguments
0 optional arguments
No rest parameter
No keyword parameters
15 byte-code instructions:
0     (CONST&PUSH 0)                      ; 0
1     (JMP L14)
3     L3
3     (LOAD&PUSH 3)
4     (LOAD&PUSH 3)
5     (CALLSR&PUSH 2 55)                  ; +
8     (PUSH-UNBOUND 1)
10    (CALLS1 142)                        ; PRINT
12    (LOAD&INC&STORE 0)
14    L14
14    (LOAD&PUSH 0)
15    (LOAD&PUSH 4)
16    (CALLSR&JMPIFNOT 1 52 L3)           ; >=
20    (NIL)
21    (SKIP&RET 4)

which computes the addition every loop.

This optimization is especially useful in heavily numeric code which
indexes arrays in multinested loops. For example, in the function

(lambda (array)
   (dotimes (i 5)
      (dotimes (j 7)
         (print (aref array (+ i 2) (+ j 4))))))

the computation (+ i 2) can be hoisted into the outer loop, since the
value doesn't change in the inner loop.

So Cleavir produces:

Disassembly of function NIL
(CONST 0) = 0
(CONST 1) = 5
(CONST 2) = 2
(CONST 3) = 0
(CONST 4) = 7
(CONST 5) = 4
(CONST 6) = NIL
1 required argument
0 optional arguments
No rest parameter
No keyword parameters
35 byte-code instructions:
0     (CONST&PUSH 0)                      ; 0
1     L1
1     (LOAD&PUSH 0)
2     (CONST&PUSH 1)                      ; 5
3     (CALLSR&PUSH 1 52)                  ; >=
6     (LOAD&JMPIF 0 L51)
9     (LOAD&PUSH 1)
10    (CONST&PUSH 2)                      ; 2
11    (CALLSR&PUSH 2 55)                  ; +
14    (CONST&PUSH 3)                      ; 0
15    L15
15    (LOAD&PUSH 0)
16    (CONST&PUSH 4)                      ; 7
17    (CALLSR&PUSH 1 52)                  ; >=
20    (LOAD&JMPIF 0 L45)
23    (LOAD&PUSH 1)
24    (CONST&PUSH 5)                      ; 4
25    (CALLSR&PUSH 2 55)                  ; +
28    (LOAD&PUSH 7)
29    (LOAD&PUSH 4)
30    (LOAD&PUSH 2)
31    (CALLSR&PUSH 2 1)                   ; AREF
34    (LOAD&PUSH 0)
35    (PUSH-UNBOUND 1)
37    (CALLS1 142)                        ; PRINT
39    (LOAD&INC&STORE 3)
41    (SKIP 3)
43    (JMP L15)
45    L45
45    (LOAD&INC&STORE 4)
47    (SKIP 4)
49    (JMP L1)
51    L51
51    (CONST 6)                           ; NIL
52    (SKIP&RET 4)

whereas CLISP must calculate (+i 2) every time and produces:
Disassembly of function NIL
(CONST 0) = 0
(CONST 1) = 5
(CONST 2) = 7
(CONST 3) = 2
(CONST 4) = 4
1 required argument
0 optional arguments
No rest parameter
No keyword parameters
29 byte-code instructions:
0     (CONST&PUSH 0)                      ; 0
1     (JMP L36)
3     L3
3     (CONST&PUSH 0)                      ; 0
4     (JMP L26)
6     L6
6     (LOAD&PUSH 3)
7     (CONST&PUSH 3)                      ; 2
8     (LOAD&PUSH 3)
9     (CALLSR&PUSH 2 55)                  ; +
12    (CONST&PUSH 4)                      ; 4
13    (LOAD&PUSH 3)
14    (CALLSR&PUSH 2 55)                  ; +
17    (CALLSR&PUSH 2 1)                   ; AREF
20    (PUSH-UNBOUND 1)
22    (CALLS1 142)                        ; PRINT
24    (LOAD&INC&STORE 0)
26    L26
26    (LOAD&PUSH 0)
27    (CONST&PUSH 2)                      ; 7
28    (CALLSR&JMPIFNOT 1 52 L6)           ; >=
32    (SKIP 1)
34    (LOAD&INC&STORE 0)
36    L36
36    (LOAD&PUSH 0)
37    (CONST&PUSH 1)                      ; 5
38    (CALLSR&JMPIFNOT 1 52 L3)           ; >=
42    (NIL)
43    (SKIP&RET 3)

As you can see, this transformation is a big improvement for nested
loops, since most time is spent in inner loops.

Charles