From: James Y K. <fo...@fu...> - 2007-10-13 01:44:17
|
SBCL's high level optimizations type-based are great, but it sure would be nice if the low level codegen was better, too. Taking a very simple example: (declaim (optimize (safety 0) (speed 3) (debug 0))) (defun foo ()) (defun test () (dotimes (i 10000) (foo))) And here's the code generated on x86-64, with sbcl 1.0.10.45. I interspersed some comments; it seems to me that almost every instruction generated is suboptimal in some way or another. Of course, fixing much of it doesn't seem like it'll be particularly easy... Comments? (I hope my lack of full understanding of how SBCL works doesn't show through too much here. :) ; 02F466AF: 31DB XOR EBX, EBX ; no- arg-parsing entry point okay: set 'i' = 0 ; 6B1: EB2D JMP L1 Unnecessary jump, the comparison at L1 is always true if we're coming from here, so it'll always jump back to L0. ; 6B3: L0: 48895DE8 MOV [RBP-24], RBX save 'i' to stack, this shouldn't be necessary, 'i' should be stored in a callee saved register (are there any in the calling convention?) ; 6B7: 488BD4 MOV RDX, RSP ; 6BA: 4883EC18 SUB RSP, 24 why is RSP decremented here instead of on entry to this function (or not at all)? ; 6BE: 488B059BFFFFFF MOV RAX, [RIP-101] ; #<FDEFINITION object for FOO> calling convention requires rax have the fdefinition in it, but this seems pretty useless (but it's basically a side effect of the indirect call below at the moment either way). ; 6C5: 31C9 XOR ECX, ECX okay: mark number of args = 0 ; 6C7: 48896AF8 MOV [RDX-8], RBP ; 6CB: 488BEA MOV RBP, RDX could possibly eliminate base pointer, and use externally maintained PC-to-stack-depth mapping, like Dwarf3 does for other languages? ; 6CE: FF5009 CALL QWORD PTR [RAX+9] Would be nice if it wasn't an indirect call. It seems possible that code components could directly call target code. Hypothesizing an implementation: (setf (symbol-function 'foo)) would mutate the first instruction (which starts out as a nop) in the target to be a jump to the new code (and GC would notice and fixup that forwarding sometime later). (symbol-function 'foo) would need to point to the instruction after the jump-or-nop so as to not be affected by above mutation. ; 6D1: 480F42E3 CMOVB RSP, RBX the suggestion in the manual about thread-local dedicated multi-value spillover area should be implemented so this instruction can be eliminated. On x86-64 at least, there's enough registers that could be used for arguments/returns that spilling should be pretty rare, anyhow. ; 6D5: 488B5DE8 MOV RBX, [RBP-24] restore 'i' from stack where it was (unoptimally) put above. ; 6D9: 488D4B08 LEA RCX, [RBX+8] ; 6DD: 488BD9 MOV RBX, RCX this looks like a stupid sequence of instructions. Why isn't this just an add? ; 6E0: L1: 4881FB80380100 CMP RBX, 80000 ; 6E7: 7CCA JL L0 compare and jump. well, really the loop should have been inverted to count down from 9999 to 0, so that the flags from the add can be used directly in a JS instruction. ; 6E9: BA17001020 MOV EDX, 537919511 okay: put nil in return location ; 6EE: 488D65F0 LEA RSP, [RBP-16] restore stack pointer, okay. but this function didn't actually really need a stack frame in the first place, did it? ; 6F2: F8 CLC okay: mark as single return. ; 6F3: 488B6DF8 MOV RBP, [RBP-8] again with that base pointer ; 6F7: C20800 RET 8 pop return address, and return. James |