On 8/17/07, Denys Rtveliashvili <rtvd@...> wrote:
> Also, I guess some arithmetics can be sped up. Not too much through. I
> tried this sample of purely arithmetic code:
> (defun foo (a b)
> (declare (type fixnum a b)
> (optimize (safety 0) (speed 3)))
> (let ((x 0))
> (declare (type fixnum x))
> (setf x 1)
> (setf x 2)
> (setf x 3)
> (setf x (/ x a))
> (setf x (+ x a b))
> (setf x (+ x 7 8 9))
> (setf x (- x 11 12 13))
> As far as I see, the division is performed by calling an external
> function. Also, unnecessary shifting is present. If we change the
> representation of numbers to untagged ones, that unnecessary shifting
> will disappear. However, it will have to be added around that call to
> "/" in order to pass the parameters as the tagged numbers. Unless we go
> a bit further and make that function be inlined and accept untagged values.
Inlining generic division would be slow, see below.
> ; 0B1922BA: 8975F0 MOV [EBP-16], ESI ;
> no-arg-parsing entry point
> ; 2BD: 31C0 XOR EAX, EAX
> ; 2BF: B804000000 MOV EAX, 4 ; x = 1
> ; 2C4: B808000000 MOV EAX, 8 ; x = 2
> ; 2C9: B80C000000 MOV EAX, 12 ; x = 3
> ; 2CE: 8BD0 MOV EDX, EAX
> ; 2D0: 8BFE MOV EDI, ESI
> ; 2D2: 8BDC MOV EBX, ESP
> ; 2D4: 55 PUSH EBP
> ; 2D5: 83EC08 SUB ESP, 8
> ; 2D8: 8BEB MOV EBP, EBX
> ; 2DA: B908000000 MOV ECX, 8
> ; 2DF: FF158C070005 CALL DWORD PTR [#x500078C] ; finally,
> a "/" function is called
> ; 2E5: 7302 JNB L0
> ; 2E7: 8BE3 MOV ESP, EBX
> ; 2E9: L0: 8BC2 MOV EAX, EDX
> ; 2EB: 8B75F0 MOV ESI, [EBP-16]
> ; 2EE: C1F802 SAR EAX, 2
> ; 2F1: 8BD6 MOV EDX, ESI
> ; 2F3: C1FA02 SAR EDX, 2
> ; 2F6: 01D0 ADD EAX, EDX ; x += a ?
> ; 2F8: 0345F4 ADD EAX, [EBP-12] ; x += b ?
> ; 2FB: C1E002 SHL EAX, 2
> ; 2FE: C1F802 SAR EAX, 2
> ; 301: 83C007 ADD EAX, 7 ; x += 8
> ; 304: 83C008 ADD EAX, 8 ; x += 8
> ; 307: 83C009 ADD EAX, 9 ; x += 9
> ; 30A: C1E002 SHL EAX, 2
> ; 30D: 83E82C SUB EAX, 44 ; x -= 11
> ; 310: 83E830 SUB EAX, 48 ; x -= 12
> ; 313: 83E834 SUB EAX, 52 ; x -= 13
> ; 316: 8BD0 MOV EDX, EAX
> ; 318: 8D65F8 LEA ESP, [EBP-8]
> ; 31B: F8 CLC
> ; 31C: 8B6DFC MOV EBP, [EBP-4]
> ; 31F: C20400 RET 4
> Taking a look at the assembly I see that:
> 1. addition tends to work with untagged values while subtraction prefers
> tagged versions.
This is, in general, not true. It happens to be true in this case,
but I'm sure that it's possible to construct cases when subtraction is
done untagged and addition is done tagged. The compiler uses the same
mechanism for determining what to do for addition and for subtraction.
> 2. division is a call to an external function (at least in this case)
> and the call seems to be quite expensive
Yes, this is because / in Common Lisp is "true division", not the
dumbed-down version that one has in languages like C, and therefore
can return rational numbers in the general case of fixnum division.
If you use TRUNCATE--which is probably what you meant--SBCL will use
an inlined version (which compiles to IDIV).
> 3. there are 3 "SAR" instructions and 2 "SHL"
Yes, this is partly because you compiled with (SAFETY 0). The
compiler is not as smart as it could be when compiling with (SAFETY 0)
in eliminating operations that would be necessary when compiling with
non-zero safety but are unnecessary otherwise.
Your example also suggests other ways in which the compiler could be improved:
1) The first three assignments to X are dead code; they should be eliminated;
2) The constant additions and subtractions should be combined into a
single addition and subtraction--bonus points for folding the whole
thing into (- X 12).