From: Vitaly M. <v.m...@gm...> - 2009-05-18 15:24:02
|
Hi! I found high cost generators for VOP FAST-+/(UN)SIGNED=>(UN)SIGNED prevent compiler to make better code. Instead of using these VOPs, it uses MOVE-WORD and FAST-+/FIXNUM. That's not fair, `lea' and `add' instructions cost the same price for CPU when using them for untagged values. diff --git a/src/compiler/x86-64/arith.lisp b/src/compiler/x86-64/arith.lisp index f615ebd..128a5e8 100644 --- a/src/compiler/x86-64/arith.lisp +++ b/src/compiler/x86-64/arith.lisp @@ -243,7 +243,7 @@ (location= x r))))) (:result-types signed-num) (:note "inline (signed-byte 64) arithmetic") - (:generator 5 + (:generator 1 (cond ((and (sc-is x signed-reg) (sc-is y signed-reg) (sc-is r signed-reg) (not (location= x r))) (inst lea r (make-ea :qword :base x :index y :scale 1))) @@ -289,7 +289,7 @@ :load-if (not (location= x r)))) (:result-types signed-num) (:note "inline (signed-byte 64) arithmetic") - (:generator 4 + (:generator 1 (cond ((and (sc-is x signed-reg) (sc-is r signed-reg) (not (location= x r))) (inst lea r (make-ea :qword :base x :disp y))) Other patch is a new VOP: FAST-+/FIXNUM-SIGNED=>FIXNUM. It's possible to utilize CPU instruction LEA for fast addition of fixnum and untagged value. diff --git a/src/compiler/x86-64/arith.lisp b/src/compiler/x86-64/arith.lisp index 128a5e8..9a85df0 100644 --- a/src/compiler/x86-64/arith.lisp +++ b/src/compiler/x86-64/arith.lisp @@ -228,6 +228,17 @@ (move r x) (inst add r (fixnumize y)))))) +(define-vop (fast-+/fixnum-signed=>fixnum fast-safe-arith-op) + (:translate +) + (:args (x :scs (any-reg) :target r) + (y :scs (signed-reg))) + (:arg-types tagged-num signed-num) + (:results (r :scs (any-reg) :from (:argument 0))) + (:result-types tagged-num) + (:note "inline fixnum arithmetic") + (:generator 1 + (inst lea r (make-ea :qword :base x :index y :scale 8)))) + (define-vop (fast-+/signed=>signed fast-safe-arith-op) (:translate +) (:args (x :scs (signed-reg) :target r With both patches applied next code works up to 20% faster: (declare (type fixnum sum) (type (simple-array (unsigned-byte 8)) buffer)) ... (map nil (lambda (x) (incf sum x)) buffer) Map now turns to such machine code: ; 0F0: L2: 488BC1 MOV RAX, RCX ; 0F3: 480FB6440201 MOVZX RAX, BYTE PTR [RDX+RAX+1] ; 0F9: 488D1CC3 LEA RBX, [RBX+RAX*8] ; 0FD: 48FFC1 INC RCX ; 100: L3: 488D04CD00000000 LEA RAX, [RCX*8] ; 108: 4839F0 CMP RAX, RSI ; 10B: 7CE3 JL L2 Where originally it was: ; 0F0: L2: 488BC1 MOV RAX, RCX ; 0F3: 48C1F803 SAR RAX, 3 ; 0F7: 480FB6440201 MOVZX RAX, BYTE PTR [RDX+RAX+1] ; 0FD: 48C1E003 SHL RAX, 3 ; 101: 4801C3 ADD RBX, RAX ; 104: 4883C108 ADD RCX, 8 ; 108: L3: 4839F1 CMP RCX, RSI ; 10B: 7CE3 JL L2 -- wbr, Vitaly |