I've looked at previous starts on this idea (notably Jonathan Smith's plus
some chatter between Nathan and Nikodemus > .5 decades ago), and I've
thought to take up the cause again. I have a working piece of code that
operates in "finalize-segment" after having buffered up all instructions in
their abstract (not machine-encoded) format, with TNs as their operands.
This works nicely modulo a bug in my understanding of TN liveness.
To motivate this discussion I'll show two examples of what the thing should
be capable of doing, subject to an assumption (presently false) that its
def/use tracking at the register level is correct:
* (disassemble '(lambda (a) (setf (svref a 3) #xf00d)))
;; with assembler debugging
@ 15 "eliminate tmp reg for CMP"
(MOV RDX QWORD PTR [RCX-7]),(MOV RAX 6),(CMP RDX RAX) -> ((MOV RAX 6)
(CMP QWORD PTR [RCX-7] RAX))
@ 18 "Imm-to-mem move"
(MOV RDX 122906),(MOV QWORD PTR [RBX+RAX*4+1] RDX) -> ((MOV QWORD PTR
(I see it missed that it could use an immediate 6, subject to RAX being
dead, for an operand in an instruction that was already edited one time)
Despite that, it makes mincemeat out of OPTIMIZATIONS item #35:
* (disassemble '(lambda (a i) (aref (the simple-vector a) i)))
;; with assembler debugging
@ 17 "eliminate tmp reg for CMP"
(MOV RBX RDI),(CMP RBX 0) -> ((CMP RDI 0))
@ 19 "eliminate tmp reg for CMP"
(MOV RBX RDI),(CMP RBX QWORD PTR [+#S(SB-C:FIXUP :NAME NIL :FLAVOR
CODE-OBJECT :OFFSET L1)]) -> ((CMP RDI QWORD PTR [+#S(SB-C:FIXUP :NAME NIL
:FLAVOR CODE-OBJECT :OFFSET L1)]))
@ 20 "Relocate BREAK"
(JMP LE L2),(MOV RAX Const5),(BREAK),(.TRAP-INFO 10 31 RDI RAX) -> ((JMP
@ 17 "Signed compares -> unsigned compare"
(CMP RDI 0),(JMP L L3),(CMP RDI QWORD PTR [+#S(SB-C:FIXUP :NAME NIL
:FLAVOR CODE-OBJECT :OFFSET L1)]),(JMP NLE L3) -> ((CMP RDI QWORD PTR
[+#S(SB-C:FIXUP :NAME NIL :FLAVOR CODE-OBJECT :OFFSET L1)]) (JMP A L3))
@ 19 "eliminate tmp reg for CMP"
(MOV RBX QWORD PTR [RDX-7]),(MOV RSI RDI),(CMP RBX RSI) -> ((MOV RSI RDI)
(CMP QWORD PTR [RDX-7] RSI))
@ 19 "eliminate tmp reg for vector index in safe SVREF"
(MOV RSI RDI),(CMP QWORD PTR [RDX-7] RSI),(JMP BE L4),(MOV RDX QWORD PTR
[RDX+RDI*4+1]) -> ((CMP QWORD PTR [RDX-7] RDI) (JMP BE L4) (MOV RDX QWORD
@ 17 "vector bound check subsumes INDEX type check"
(CMP RDI QWORD PTR [+#S(SB-C:FIXUP :NAME NIL :FLAVOR CODE-OBJECT :OFFSET
L1)]),(JMP A L3),(CMP QWORD PTR [RDX-7] RDI),(JMP BE L4) -> ((CMP QWORD PTR
[RDX-7] RDI) (JMP BE L4))
Here's the fly in the ointment -
My first cut performed optimizations in a vacuum, without knowing IR2 stuff.
It was highly conservative, examining only straight-line code; and as well
as being unmaintainable and inflexible crud, could not perform interesting
For what little it was worth, this could operate on anybody's x86 assembly
source code doing only very trivial rewrites such as
"AND RAX,imm;AND RAX,other-imm" -> "AND RAX,(AND imm other-imm)"
My next approach looked for TNs that could be deleted with impunity since
SBCL's abstract assembly code (not machine-encoded) has the TNs. When we
find for example "MOV t1, [mem]; MOV t2, t1" with no other apparent use of
t1 then we can maybe substitute "MOV t2, [mem]"
This approach scanned the assembly segment for references to context TNs
(the fragment of code in which replacement occurs) outside that context.
This admits more optimization but is still wrong because the helpful MOVE
function elides references to TNs in the same register, so despite that you
see no uses of a TN by looking at instructions, that's not necessarily true.
My third way got rid of the scanning logic that didn't work, and avails
itself of the TN-READS slot but:
(1) it doesn't work because code emitters can use MAKE-RANDOM-TN to put
something in an instruction that is unreflective of what the VOP operands
(2) I thought I understood TN-REFs but I don't.
My sense was that sufficed to ensure that each TN in the instructions being
edited (the context), to be a candidate for elision, must have exactly one
read ref and that the ref is also in the context, except that some ref can
read a TN from anywhere else. (This glosses over some points but that's the
general idea) These conditions holding, all the above replacements that
say "eliminate ..." would be admissible.
I suspect that a hybrid approach, with some way of detecting random TN
creation will win, and ultimately be inordinately simpler than abstract
interpretation and register tracing. I preserved my hack to keep
"do-nothing" TN moves just to see them in case of need; then they're
deleted when calling the real assembler.
Anyway, my changes to 'assem.lisp' are significantly less invasive than one
would think; the hair is in other files plus some ancillary cleanups. All
the EMIT-foo macros just buffer stuff up, and then FINALIZE-SEGMENT calls
the rewriter which then calls back to the real EMIT encoders.
Nothing in the scheduling assembler logic is altered - this could be a
pre-pass to instruction-level scheduling if so desired. (They're at
different levels of abstraction)
On the plus side, SBCL is able to compile itself and pass all tests in the
- the target-feature to bring in vm-specific rewriting rules is absent
(i.e. the few hooks in 'assem' are in, the new rule driving loop is in, but
the rewriter has nothing to do).
- the rules are present but those which use the mythical TN-ELIDABLE-P test
These give confidence that the assembler still works.
I'm certainly happy to compose a patch of everything, especially since at
least a few parts of it are factorable out such as adding an assembler
directive for '.trap-info' rather than having to reverse engineer that
"BYTE;BYTE;BYTE..." was perhaps a list of SC-OFFSETs for a trap handler.
This allows for example possible elision of a register-register move for a
the bound of a vector - you have to alter the trap data too. (Basically
But rather than my mailing this around as is, might I ask who would be
willing to help me understand the piece I lack, namely a TN-ELIDABLE-P
Regards and thanks,