From: Vitaly M. <v.m...@gm...> - 2008-11-19 12:51:12
|
Hi! First of all, sorry for my weird english ;) As far as I can see, SBCL doesn't able to do reg/mem operations like `add rax, [rsi + 123]' (I'm assuming x86-64 architecture). Instead of it, compiler emits 2 operations: `mov temp, [rsi + 123]; add rax, temp' Here is an example: (defun sum (a b) (declare (type fixnum a) (type (array fixnum) b) (optimize (speed 3) (safety 0) (debug 0))) (+ a (aref b 1))) SBCL produces such VOPs: 7: DATA-VECTOR-REF-WITH-OFFSET/SIMPLE-ARRAY-SIGNED-BYTE-61 t35[RDX] t36[RDI] {0} => t37[RAX] 8: MOVE-TO-WORD/FIXNUM t37[RAX] => t38[RCX] 9: FAST-+/SIGNED=>SIGNED A!14[S2]>t39[RAX] t38[RCX] => t40[RAX] and such code: ... ; 79: 488B443A01 MOV RAX, [RDX+RDI+1] ; 7E: 488BC8 MOV RCX, RAX ; 81: 48C1F903 SAR RCX, 3 ; 85: 488B45E8 MOV RAX, [RBP-24] ; 89: 4801C8 ADD RAX, RCX While the optimal code will be `mov rax, ...; add rax, [rdx+rdi+1]'. Well, this example is not ideal, because there is a type coercion before addition... The same thing is with floating point calculations: SBCL always loads both values to xmm registers and than performs needed operation on them. In many cases it will be faster to use reg/mem operation instead of reg/reg. Seems, SBCL is able to generate proper machine instructions (at least, sse-related emitters), but arithmetic/logical vops are not ready to take arguments of storage class other than *-reg. It's not a big deal to modify these vops, but I've trapped on another problem: SBCL coerces, for example, double-stack to double-reg with help of proper define-move-op (define-move-fun?), which is simple `movsd' for that case. AFAIU, only such define-move-op'ed vops can calculate effective addresses in compile time. For example, double-stack to double-reg coercion is done with this vop: (define-move-fun (load-double 2) (vop x y) ((double-stack) (double-reg)) (inst movsd y (ea-for-df-stack x))) It reduces to a single machine command `movsd y, xxx'. Effective address for x value (stored in the stack) was calculated in compile time. When I hijack standard double-float subtraction vop with my own and try to use the same `ea-for-df-stack' in it, compiler emits runtime double-stack to double-reg coercion and, thus, kills performance cruelly. I'm totally stuck here and asking for help. Is it possible to do reg/mem operations at all in SBCL? Many thanks! -- wbr, Vitaly |
From: Nathan F. <fr...@gm...> - 2008-11-19 14:36:26
|
On Wed, Nov 19, 2008 at 7:51 AM, Vitaly Mayatskikh <v.m...@gm...> wrote: > In many cases it will be faster to use reg/mem operation instead of > reg/reg. Seems, SBCL is able to generate proper machine instructions > (at least, sse-related emitters), but arithmetic/logical vops are not > ready to take arguments of storage class other than *-reg. I believe you are mistaken here; the VOPs in src/compiler/x86{,-64}/arith.lisp are certainly prepared to take arguments that reside on the stack. (We're not always as good as we could be about using arguments from the stack rather than loading them into registers, but that's a story for another day...) > When I hijack standard double-float subtraction vop with my own and > try to use the same `ea-for-df-stack' in it, compiler emits runtime > double-stack to double-reg coercion and, thus, kills performance > cruelly. > > I'm totally stuck here and asking for help. Is it possible to do > reg/mem operations at all in SBCL? I believe the problem here is that the VOPs in src/compiler/x86-64/float.lisp don't define appropriate :LOAD-IF predicates. For instance, from arith.lisp: (define-vop (fast-unsigned-binop fast-safe-arith-op) (:args (x :target r :scs (unsigned-reg) :load-if (not (and (sc-is x unsigned-stack) (sc-is y unsigned-reg) (sc-is r unsigned-stack) (location= x r)))) (y :scs (unsigned-reg unsigned-stack))) (:arg-types unsigned-num unsigned-num) (:results (r :scs (unsigned-reg) :from (:argument 0) :load-if (not (and (sc-is x unsigned-stack) (sc-is y unsigned-reg) (sc-is r unsigned-stack) (location= x r))))) (:result-types unsigned-num) (:note "inline (unsigned-byte 64) arithmetic")) Mimic'ing the :LOAD-IF predicates and the :SCS appropriately for single/double-float operations should enable SBCL to generate the reg/mem operations you wish to see. -Nathan |
From: Vitaly M. <v.m...@gm...> - 2008-11-19 15:35:10
|
At Wed, 19 Nov 2008 09:36:22 -0500, Nathan Froyd wrote: > I believe the problem here is that the VOPs in > src/compiler/x86-64/float.lisp don't define appropriate :LOAD-IF > predicates. For instance, from arith.lisp: > > (define-vop (fast-unsigned-binop fast-safe-arith-op) > (:args (x :target r :scs (unsigned-reg) > :load-if (not (and (sc-is x unsigned-stack) > (sc-is y unsigned-reg) > (sc-is r unsigned-stack) > (location= x r)))) > (y :scs (unsigned-reg unsigned-stack))) > (:arg-types unsigned-num unsigned-num) > (:results (r :scs (unsigned-reg) :from (:argument 0) > :load-if (not (and (sc-is x unsigned-stack) > (sc-is y unsigned-reg) > (sc-is r unsigned-stack) > (location= x r))))) > (:result-types unsigned-num) > (:note "inline (unsigned-byte 64) arithmetic")) > > Mimic'ing the :LOAD-IF predicates and the :SCS appropriately for > single/double-float operations should enable SBCL to generate the > reg/mem operations you wish to see. Thanks for quick response! Of course, I've tried to play with load-if, but without success. Ok, it is principally possible, thanks! -- wbr, Vitaly |
From: Nathan F. <fr...@gm...> - 2008-11-19 17:28:09
|
On Wed, Nov 19, 2008 at 10:34 AM, Vitaly Mayatskikh <v.m...@gm...> wrote: > At Wed, 19 Nov 2008 09:36:22 -0500, Nathan Froyd wrote: >> Mimic'ing the :LOAD-IF predicates and the :SCS appropriately for >> single/double-float operations should enable SBCL to generate the >> reg/mem operations you wish to see. > > Thanks for quick response! > > Of course, I've tried to play with load-if, but without success. Ok, > it is principally possible, thanks! Of course, if you do play around with it more and it doesn't seem to be doing what you want, please tell us! Fixing the bug would be even better, though. :) -Nathan |
From: Vitaly M. <v.m...@gm...> - 2008-11-19 19:17:40
|
At Wed, 19 Nov 2008 12:28:02 -0500, Nathan Froyd wrote: > > Of course, I've tried to play with load-if, but without success. Ok, > > it is principally possible, thanks! > > Of course, if you do play around with it more and it doesn't seem to > be doing what you want, please tell us! Fixing the bug would be even > better, though. :) I should fix my understanding of compiler first ;) My vop: (define-vop (-/double-float-mem float-op) (:args (x :scs (double-reg) :target r) (y :scs (double-stack);descriptor-reg) :load-if (sc-is y double-reg))) (:results (r :scs (double-reg) :load-if (not (location= r x)))) (:arg-types double-float double-float) (:result-types double-float) (:translate -) (:generator 1 (inst subsd x y))) It is wrong. SBCL produces error message: ; compiling (DEFINE-VOP (-/DOUBLE-FLOAT-MEM FLOAT-OP) ...); ; caught ERROR: ; (during macroexpansion of (DEFINE-VOP (-/DOUBLE-FLOAT-MEM FLOAT-OP) ...)) ; In the Y argument to VOP -/DOUBLE-FLOAT-MEM, ; none of the SCs allowed by the operand type DOUBLE-FLOAT can directly be loaded ; into any of the restriction's SCs: ; (DOUBLE-STACK) When `y' contains descriptor-reg as scs, SBCL compiles ok, but than compiler tries to emit immediates as parameter to sse instructions (I have floating point constants in test case). Of course, that's impossible, sse doesn't work with immediates. And there's proper `error' in SBCL sources reporting about this issue: debugger invoked on a SIMPLE-ERROR in thread #<THREAD "initial thread" RUNNING {1002738DD1}>: Constant TNs can only be directly used in MOV, PUSH, and CMP. For normal variables compiler still emits some coercion code (I'm not sure what it tries to do): ; 0378C093: F20F104E01 MOVSD XMM1, [RSI+1] ; no-arg-parsing entry point ; 098: F20F105609 MOVSD XMM2, [RSI+9] ; 09D: F20F105E11 MOVSD XMM3, [RSI+17] ; 0A2: F20F104201 MOVSD XMM0, [RDX+1] ; 0A7: F20F106209 MOVSD XMM4, [RDX+9] ; 0AC: F20F106A11 MOVSD XMM5, [RDX+17] ===> ; 0B1: 41808C24A000000008 OR BYTE PTR [R12+160], 8 ; 0BA: 4D8B5C2450 MOV R11, [R12+80] ; 0BF: 498D4310 LEA RAX, [R11+16] ; 0C3: 4939442458 CMP [R12+88], RAX ; 0C8: 0F8623030000 JBE L19 ; 0CE: 4989442450 MOV [R12+80], RAX ; 0D3: 498D430F LEA RAX, [R11+15] ; 0D7: L0: 48C740F11E010000 MOV QWORD PTR [RAX-15], 286 ; 0DF: F20F1140F9 MOVSD [RAX-7], XMM0 ; 0E4: 4180B424A000000008 XOR BYTE PTR [R12+160], 8 ; 0ED: 7402 JEQ L1 ; 0EF: CC09 BREAK 9 ; pending interrupt trap <=== ; 0F1: L1: F20F5CC8 SUBSD XMM1, XMM0 What are storage classes *-stack and descriptor-reg exactly? As I understand, *-stack is a boxed representation of value somewhere in the heap. Why double-float can not be stored in double-stack in my vop? Or, more exactly, it is already somewhere in the heap, and I want to access to it directly in arithmetic operation without doing `move' before. Am I right that work with descriptor-reg in machine code looks like ; 098: F20F105609 MOVSD XMM2, [RSI+9] and *-stack works with RBP? Like this: ; 1E8: 488B45E0 MOV RAX, [RBP-32] ; 1EC: F20F1060F9 MOVSD XMM4, [RAX-7] Thanks in advance! -- wbr, Vitaly |
From: Nathan F. <fr...@gm...> - 2008-11-19 19:34:49
|
On Wed, Nov 19, 2008 at 2:17 PM, Vitaly Mayatskikh <v.m...@gm...> wrote: > I should fix my understanding of compiler first ;) > > My vop: > > (define-vop (-/double-float-mem float-op) > (:args (x :scs (double-reg) :target r) > (y :scs (double-stack);descriptor-reg) > :load-if (sc-is y double-reg))) Yes, this is not what you want. You are saying "Y must reside in the stack, and you should only load it if it's actually being passed in a register." I believe this is backwards from what you want. Better would be to say: (y :scs (double-reg double-stack)) to tell the compiler that Y can live on the stack or in a register. You can change X and R to :SCS (DOUBLE-REG DOUBLE-STACK) as well, but your :LOAD-IF predicates will have to change too. (You don't need a :LOAD-IF on R in your original example.) > When `y' contains descriptor-reg as scs, SBCL compiles ok, but than > compiler tries to emit immediates as parameter to sse instructions > (I have floating point constants in test case)... > For normal variables compiler still emits some coercion code (I'm not > sure what it tries to do): I don't know exactly what's going on with the constants, but there are FIXMEs floating around in compiler/x86/float.lisp about using DESCRIPTOR-REG as an SC for float operations. It would be nice to support them, but other parts of the compiler would have to be modified to avoid unnecessary boxing. I don't have a good guess on how extensive those changes would be. > What are storage classes *-stack and descriptor-reg exactly? As I > understand, *-stack is a boxed representation of value somewhere in > the heap. Why double-float can not be stored in double-stack in my > vop? Or, more exactly, it is already somewhere in the heap, and I want > to access to it directly in arithmetic operation without doing `move' > before. *-STACK means that things are stored on the normal runtime stack, not the heap. DESCRIPTOR-REG identifies an integer register that holds tagged (boxed) lisp data. > Am I right that work with descriptor-reg in machine code looks like > > ; 098: F20F105609 MOVSD XMM2, [RSI+9] > > and *-stack works with RBP? Like this: > > ; 1E8: 488B45E0 MOV RAX, [RBP-32] > ; 1EC: F20F1060F9 MOVSD XMM4, [RAX-7] Yes, that's correct. Note that RAX is probably also a DESCRIPTOR-REG in the above code. -Nathan |
From: Vitaly M. <v.m...@gm...> - 2008-11-19 20:28:06
|
At Wed, 19 Nov 2008 14:34:45 -0500, Nathan Froyd wrote: > > (define-vop (-/double-float-mem float-op) > > (:args (x :scs (double-reg) :target r) > > (y :scs (double-stack);descriptor-reg) > > :load-if (sc-is y double-reg))) > > Yes, this is not what you want. You are saying "Y must reside in the > stack, and you should only load it if it's actually being passed in a > register." I believe this is backwards from what you want. Better > would be to say: > > (y :scs (double-reg double-stack)) > > to tell the compiler that Y can live on the stack or in a register. With double-reg in scs SBCL always chooses double-reg :( I wanted to enforce it to use objects directly. And a pair of `move to double-reg' + `op with double-reg' should have different cost comparing to `op with double-stack/descriptor-reg', I think. But how to make parameter `y' of type double-float here loadable to double-stack scs? May be double-stack is not a good idea, but descriptor-reg seems to be broken for float sc, as you have mentioned. > You can change X and R to :SCS (DOUBLE-REG DOUBLE-STACK) as well, but > your :LOAD-IF predicates will have to change too. (You don't need a > :LOAD-IF on R in your original example.) This is a sort of (declare (ignore r)). SBCL stops its build when there is unused variables in vop. > *-STACK means that things are stored on the normal runtime stack, not > the heap. DESCRIPTOR-REG identifies an integer register that holds > tagged (boxed) lisp data. Yeah, my bad. I wanted to say 'in normal machine stack' ;) > > Am I right that work with descriptor-reg in machine code looks like > > > > ; 098: F20F105609 MOVSD XMM2, [RSI+9] > > > > and *-stack works with RBP? Like this: > > > > ; 1E8: 488B45E0 MOV RAX, [RBP-32] > > ; 1EC: F20F1060F9 MOVSD XMM4, [RAX-7] > > Yes, that's correct. Note that RAX is probably also a DESCRIPTOR-REG > in the above code. Ok, thanks for explanation! -- wbr, Vitaly |
From: Nathan F. <fr...@gm...> - 2008-11-19 20:59:09
|
On Wed, Nov 19, 2008 at 3:27 PM, Vitaly Mayatskikh <v.m...@gm...> wrote: > With double-reg in scs SBCL always chooses double-reg :( I wanted to > enforce it to use objects directly. And a pair of `move to double-reg' > + `op with double-reg' should have different cost comparing to `op > with double-stack/descriptor-reg', I think. Do you have a testcase demonstrating the shortcoming you're attempting to correct (with unpatched SBCL)? That is, one that's loading from the stack and then operating on the value. > But how to make parameter `y' of type double-float here loadable to > double-stack scs? May be double-stack is not a good idea, but > descriptor-reg seems to be broken for float sc, as you have > mentioned. You don't want it loadable to double-stack scs; that's not how :LOAD-IF works. :LOAD-IF says "I usually take things in registers, but if the value actually resides in memory, I can operate on it from memory unless..." >> You can change X and R to :SCS (DOUBLE-REG DOUBLE-STACK) as well, but >> your :LOAD-IF predicates will have to change too. (You don't need a >> :LOAD-IF on R in your original example.) > > This is a sort of (declare (ignore r)). SBCL stops its build when > there is unused variables in vop. Ah, I didn't even look at the rest of your VOP. It's incorrect if you're not using R, then; X might be equal to R, but it might not. -Nathan |
From: Vitaly M. <v.m...@gm...> - 2008-11-19 21:50:40
|
At Wed, 19 Nov 2008 15:59:04 -0500, Nathan Froyd wrote: > On Wed, Nov 19, 2008 at 3:27 PM, Vitaly Mayatskikh > <v.m...@gm...> wrote: > > With double-reg in scs SBCL always chooses double-reg :( I wanted to > > enforce it to use objects directly. And a pair of `move to double-reg' > > + `op with double-reg' should have different cost comparing to `op > > with double-stack/descriptor-reg', I think. > > Do you have a testcase demonstrating the shortcoming you're attempting > to correct (with unpatched SBCL)? That is, one that's loading from > the stack and then operating on the value. I'm compiling "low level" ray tracer written in lisp: http://www.ffconsultancy.com/ocaml/ray_tracer/code/5/ray.lisp (disassemble 'ray-sphere) shows: ; disassembly for RAY-SPHERE ; 028588E3: F20F104E01 MOVSD XMM1, [RSI+1] ; no-arg-parsing entry point ; 8E8: F20F105609 MOVSD XMM2, [RSI+9] ; 8ED: F20F105E11 MOVSD XMM3, [RSI+17] ; 8F2: F20F106201 MOVSD XMM4, [RDX+1] ; 8F7: F20F106A09 MOVSD XMM5, [RDX+9] ; 8FC: F20F107211 MOVSD XMM6, [RDX+17] ; 901: F20F5CCC SUBSD XMM1, XMM4 ; 905: F20F5CD5 SUBSD XMM2, XMM5 ; 909: F20F5CDE SUBSD XMM3, XMM6 ; 90D: F20F106701 MOVSD XMM4, [RDI+1] ; 912: F20F106F09 MOVSD XMM5, [RDI+9] ; 917: F20F107711 MOVSD XMM6, [RDI+17] but I want to see here such code: ; disassembly for RAY-SPHERE ; 028588E3: F20F104E01 MOVSD XMM1, [RSI+1] ; no-arg-parsing entry point ; 8E8: F20F105609 MOVSD XMM2, [RSI+9] ; 8ED: F20F105E11 MOVSD XMM3, [RSI+17] ; 901: F20F5CCC SUBSD XMM1, [RDX+1] ; 905: F20F5CD5 SUBSD XMM2, [RDX+9] ; 909: F20F5CDE SUBSD XMM3, [RDX+17] > You don't want it loadable to double-stack scs; that's not how > :LOAD-IF works. :LOAD-IF says "I usually take things in registers, > but if the value actually resides in memory, I can operate on it from > memory unless..." Ah... I thought the logic was: "do load by default unless specified in a different way by :load-if". Thanks! > > This is a sort of (declare (ignore r)). SBCL stops its build when > > there is unused variables in vop. > > Ah, I didn't even look at the rest of your VOP. It's incorrect if > you're not using R, then; X might be equal to R, but it might not. Ok, I have another question: VOP takes 2 args (`x' and `y') and returns result in the same physical storage like `x' has (modifies state of `x'). How can I describe it in VOP correctly? -- wbr, Vitaly |
From: Nathan F. <fr...@gm...> - 2008-11-19 22:06:47
|
On Wed, Nov 19, 2008 at 4:50 PM, Vitaly Mayatskikh <v.m...@gm...> wrote: > I'm compiling "low level" ray tracer written in lisp: > http://www.ffconsultancy.com/ocaml/ray_tracer/code/5/ray.lisp ... > but I want to see here such code: > > ; disassembly for RAY-SPHERE > ; 028588E3: F20F104E01 MOVSD XMM1, [RSI+1] ; no-arg-parsing entry point > ; 8E8: F20F105609 MOVSD XMM2, [RSI+9] > ; 8ED: F20F105E11 MOVSD XMM3, [RSI+17] > ; 901: F20F5CCC SUBSD XMM1, [RDX+1] > ; 905: F20F5CD5 SUBSD XMM2, [RDX+9] > ; 909: F20F5CDE SUBSD XMM3, [RDX+17] Ah, I think I see the problem, then. If I understand the code correctly, RSI and RDX are pointers to a double-float array. The VOPs are then generated to do: fetch RSI[0] fetch RSI[1] fetch RSI[2] fetch RDX[0] fetch RDX[1] fetch RDX[2] ...do subtractions... and there's no good way to fold those memory accesses into the VOPs that actually do the subtractions. (When we were talking about DESCRIPTOR-REGs earlier, those registers were assumed to hold tagged double-floats, not arrays of unboxed double-floats.) What you want to do is certainly desirable, but it's not going to be done by twiddling with the descriptions of the VOPs. >> > This is a sort of (declare (ignore r)). SBCL stops its build when >> > there is unused variables in vop. >> >> Ah, I didn't even look at the rest of your VOP. It's incorrect if >> you're not using R, then; X might be equal to R, but it might not. > > Ok, I have another question: VOP takes 2 args (`x' and `y') and returns > result in the same physical storage like `x' has (modifies state of > `x'). How can I describe it in VOP correctly? You can specify that you'd prefer if the first argument and the result share the same physical storage (that's what :TARGET does) and the register allocator will take that into account (and hopefully do the right thing!). But the VOP still needs to work correctly even if that preference isn't satisfied. -Nathan |
From: Nathan F. <fr...@gm...> - 2008-11-19 22:09:34
|
On Wed, Nov 19, 2008 at 5:06 PM, Nathan Froyd <fr...@gm...> wrote: > What you want to do is certainly desirable, but it's not going to be > done by twiddling with the descriptions of the VOPs. I should have been more precise: folding the array accesses into subsequent operations can't be done by twiddling with VOPs. Folding things like: movsd xmm0, [rbp-64] addsd xmm1, xmm0 into: addsd xmm1, [rbp-64] should still be doable by twiddling with VOPs, though. -Nathan |
From: Vitaly M. <v.m...@gm...> - 2008-11-19 22:53:23
|
At Wed, 19 Nov 2008 17:06:41 -0500, Nathan Froyd wrote: > Ah, I think I see the problem, then. If I understand the code > correctly, RSI and RDX are pointers to a double-float array. The VOPs > are then generated to do: > > fetch RSI[0] > fetch RSI[1] > fetch RSI[2] > fetch RDX[0] > fetch RDX[1] > fetch RDX[2] > ...do subtractions... > > and there's no good way to fold those memory accesses into the VOPs > that actually do the subtractions. (When we were talking about > DESCRIPTOR-REGs earlier, those registers were assumed to hold tagged > double-floats, not arrays of unboxed double-floats.) > > What you want to do is certainly desirable, but it's not going to be > done by twiddling with the descriptions of the VOPs. I see... But using reg/mem operations with regular double-floats is ok too. Btw, SSE4 has `dot product' operation. How hard it can be to realize pattern recognition for dot product in SBCL? > > Ok, I have another question: VOP takes 2 args (`x' and `y') and returns > > result in the same physical storage like `x' has (modifies state of > > `x'). How can I describe it in VOP correctly? > > You can specify that you'd prefer if the first argument and the result > share the same physical storage (that's what :TARGET does) and the > register allocator will take that into account (and hopefully do the > right thing!). But the VOP still needs to work correctly even if that > preference isn't satisfied. Thanks! -- wbr, Vitaly |
From: Paul K. <pk...@gm...> - 2008-11-19 23:06:00
|
On 19-Nov-08, at 4:50 PM, Vitaly Mayatskikh wrote: >> On Wed, Nov 19, 2008 at 3:27 PM, Vitaly Mayatskikh >> <v.m...@gm...> wrote: >>> With double-reg in scs SBCL always chooses double-reg :( I wanted to >>> enforce it to use objects directly. And a pair of `move to double- >>> reg' >>> + `op with double-reg' should have different cost comparing to `op >>> with double-stack/descriptor-reg', I think. > > ; disassembly for RAY-SPHERE > ; 028588E3: F20F104E01 MOVSD XMM1, [RSI+1] ; no- > arg-parsing entry point > ; 8E8: F20F105609 MOVSD XMM2, [RSI+9] > ; 8ED: F20F105E11 MOVSD XMM3, [RSI+17] > ; 8F2: F20F106201 MOVSD XMM4, [RDX+1] > ; 8F7: F20F106A09 MOVSD XMM5, [RDX+9] > ; 8FC: F20F107211 MOVSD XMM6, [RDX+17] > ; 901: F20F5CCC SUBSD XMM1, XMM4 > ; 905: F20F5CD5 SUBSD XMM2, XMM5 > ; 909: F20F5CDE SUBSD XMM3, XMM6 > ; 90D: F20F106701 MOVSD XMM4, [RDI+1] > ; 912: F20F106F09 MOVSD XMM5, [RDI+9] > ; 917: F20F107711 MOVSD XMM6, [RDI+17] > > but I want to see here such code: > > ; disassembly for RAY-SPHERE > ; 028588E3: F20F104E01 MOVSD XMM1, [RSI+1] ; no- > arg-parsing entry point > ; 8E8: F20F105609 MOVSD XMM2, [RSI+9] > ; 8ED: F20F105E11 MOVSD XMM3, [RSI+17] > ; 901: F20F5CCC SUBSD XMM1, [RDX+1] > ; 905: F20F5CD5 SUBSD XMM2, [RDX+9] > ; 909: F20F5CDE SUBSD XMM3, [RDX+17] While I agree that the latter is more esthetic, have you actually made sure that there is any difference in performance between the two versions? In the general case, it would be useful to save a couple registers by loading from memory in the arithmetic instructions. However, that doesn't seem to be an issue here. The hardware could even surprise us by handling separate load and arithmetic instructions better (although that would indeed be surprising). Paul Khuong |
From: Vitaly M. <v.m...@gm...> - 2008-11-20 07:28:56
|
At Wed, 19 Nov 2008 18:05:51 -0500, Paul Khuong wrote: > While I agree that the latter is more esthetic, have you actually made > sure that there is any difference in performance between the two > versions? No, I don't think there's any significant difference in that example. However, it's better to optimize loading of registers while doing intensive calculations. > In the general case, it would be useful to save a couple > registers by loading from memory in the arithmetic instructions. > However, that doesn't seem to be an issue here. The hardware could > even surprise us by handling separate load and arithmetic instructions > better (although that would indeed be surprising). Agree. It really depends on hardware. -- wbr, Vitaly |
From: Thiemo S. <th...@ne...> - 2008-11-20 11:20:36
|
Paul Khuong wrote: > On 19-Nov-08, at 4:50 PM, Vitaly Mayatskikh wrote: > > >> On Wed, Nov 19, 2008 at 3:27 PM, Vitaly Mayatskikh > >> <v.m...@gm...> wrote: > >>> With double-reg in scs SBCL always chooses double-reg :( I wanted to > >>> enforce it to use objects directly. And a pair of `move to double- > >>> reg' > >>> + `op with double-reg' should have different cost comparing to `op > >>> with double-stack/descriptor-reg', I think. > > > > ; disassembly for RAY-SPHERE > > ; 028588E3: F20F104E01 MOVSD XMM1, [RSI+1] ; no- > > arg-parsing entry point > > ; 8E8: F20F105609 MOVSD XMM2, [RSI+9] > > ; 8ED: F20F105E11 MOVSD XMM3, [RSI+17] > > ; 8F2: F20F106201 MOVSD XMM4, [RDX+1] > > ; 8F7: F20F106A09 MOVSD XMM5, [RDX+9] > > ; 8FC: F20F107211 MOVSD XMM6, [RDX+17] > > ; 901: F20F5CCC SUBSD XMM1, XMM4 > > ; 905: F20F5CD5 SUBSD XMM2, XMM5 > > ; 909: F20F5CDE SUBSD XMM3, XMM6 > > ; 90D: F20F106701 MOVSD XMM4, [RDI+1] > > ; 912: F20F106F09 MOVSD XMM5, [RDI+9] > > ; 917: F20F107711 MOVSD XMM6, [RDI+17] > > > > but I want to see here such code: > > > > ; disassembly for RAY-SPHERE > > ; 028588E3: F20F104E01 MOVSD XMM1, [RSI+1] ; no- > > arg-parsing entry point > > ; 8E8: F20F105609 MOVSD XMM2, [RSI+9] > > ; 8ED: F20F105E11 MOVSD XMM3, [RSI+17] > > ; 901: F20F5CCC SUBSD XMM1, [RDX+1] > > ; 905: F20F5CD5 SUBSD XMM2, [RDX+9] > > ; 909: F20F5CDE SUBSD XMM3, [RDX+17] > > While I agree that the latter is more esthetic, have you actually made > sure that there is any difference in performance between the two > versions? In the general case, it would be useful to save a couple > registers by loading from memory in the arithmetic instructions. It also reduces Icache pressure a bit. A simple testcase is unlikely to account for this. Thiemo |
From: Nikodemus S. <nik...@ra...> - 2008-11-20 18:21:39
|
On Wed, Nov 19, 2008 at 10:27 PM, Vitaly Mayatskikh <v.m...@gm...> wrote: > This is a sort of (declare (ignore r)). SBCL stops its build when > there is unused variables in vop. You can use (:ignore r) in a DEFINE-VOP to declare a variable as ignored. Cheers, -- Nikodemus |
From: Vitaly M. <v.m...@gm...> - 2008-11-20 18:28:14
|
At Thu, 20 Nov 2008 20:21:34 +0200, Nikodemus Siivola wrote: > > This is a sort of (declare (ignore r)). SBCL stops its build when > > there is unused variables in vop. > > You can use > > (:ignore r) > > in a DEFINE-VOP to declare a variable as ignored. Thanks! -- wbr, Vitaly |