Hi Christophe,
>> ; C09: 31C9 XOR ECX, ECX
>> ; C0B: EB11 JMP L6
>> ; C0D: L5: 8BD9 MOV EBX, ECX
>> ; C0F: C1FB02 SAR EBX, 2
>> ; C12: B808000000 MOV EAX, 8
>> ; C17: 88441A01 MOV [EDX+EBX+1], AL
>> ; C1B: 83C104 ADD ECX, 4
>> ; C1E: L6: 83F928 CMP ECX, 40
>> ; C21: 7CEA JL L5
>>
> No, it isn't. ECX holds the lisp variable i in your loop, which is a
> live variable. The actual array access is done byte-by-byte, which is
> achieved by your "mysterious" point four.
>
Ah.. I think I got the idea. "SAR EBX, 2" shifts "EBX" by two bits to
the right and thus divides the EBX. However, the code seems to be
slightly overcomplicated. You see, the ECX does not hold the value of
"i". Instead, it holds "i" multiplied by 4. That's why "CMP ECX, 40"
compares it to 40 instead of 10. And when an index is necessary, ECX is
copied into EBX (so that ECX does not get changed), divided by 4 and
only then used to specify the index.
I suppose that the magic number "4" appeared because usually the array
elements in SBCL are 4 bytes long. Anyway, a better (?) code would look
like this:
XOR ECX, ECX
MOV EAX, 8
JMP L6
L5: MOV [EDX+ECX+1], AL
INC ECX
L6: CMP ECX, 10
JL L5
In this way the EAX is loaded only once and then re-used.
Also, the body of the loop is now only 4 instructions instead of 7. This
might be not a big win, but that's good anyway.
The better results would be achieved with the "LOOP" instruction, but it
is clear that SBCL may not convert the (loop ...) with the counter
raising from 0 to a positive value to an assembly with that instruction.
>> 2. The "MOV" command on line C17 seems to be too heavyweight, doesn't
>> it? I do not see why three values should be added just to get the
>> address on the next entry.
>>
>> 3. On line C12 it is clearly seen that EAX is filled instantly with the
>> value "8" which is a constant. It would be great if I can make SBCL put
>> it out of the loop, between the lines C09 and C0B.
>>
>
> These two issues fall under the general class of loop optimizations:
> strength reduction and invariant hoisting. There is some apparatus in
> SBCL for performing this kind of optimization (for instance, loop
> analysis is used in the register allocator) but not very much is done.
> If you are interested in this kind of thing, that is where you should
> start.
>
It would be great to take a look at that. Could you suggest where should
I begin?
> If you're filling a vector with a constant value, call FILL instead
>
Thank you very much for the hint! By the way, I suppose FILL does not
work for 2D arrays. Could you also suggest some functions for
copying/filling specific regions of arrays? Or maybe there is some way
to interpret a 2D array as a 1D one?
>> Also, It would be great if someone could point me out where can I take a
>> look for the SBCL's optimization of the assembly. Perhaps I could tune
>> it a bit?
>>
>
> There is no one place where SBCL generates assembly in the sense that
> you mean. Instead, the compiler converts into a high-level internal
> representation, performs various optimizations on that representation,
> converts to a lower level representation, performs some more
> optimizations, and then emits assembly to implement what remains of
> that lower-level internal representation.
>
I have almost forget the x86 assembly and I am quite new to LISP, but
I'd like to take a look if I can figure out how does it work and
perhaps tune it a bit.
It looks like this goes out of the sbcl-help scope, but could you
suggest a place to start?
> (To address another of your questions which I have snipped: it is
> possible to pass vectors of numbers to foreign code without copying.)
>
Oh, that's great :-) Thank you.
With best wishes,
Denys
|