"Nathan Froyd" <froydnj@...> writes:
> > ; 6B3: L0: 48895DE8 MOV [RBP-24], RBX
> > save 'i' to stack, this shouldn't be necessary, 'i' should be stored
> > in a callee saved register (are there any in the calling convention?)
> I believe all non-argument passing registers are caller-saved.
And adding new callee-saved registers (in addition to the frame and
stack pointers) would require making the "fast path" of unwind-protect
and catch even less fast than they are now. In an ideal world the fast
path for those would be zero-cost like try-catch is on the JVM and
this would be a non-issue, but that would require having better
debugging information and reliable stack traces (which we still don't
have on x86 and x86-64).
> > ; 6B7: 488BD4 MOV RDX, RSP ; 6BA: 4883EC18 SUB RSP, 24 why is RSP
> > decremented here instead of on entry to this function (or not at
> > all)?
> Because of the manner in which function calls are expanded from IR1 to
> IR2 and the lack of loop invariant code motion in IR2.
Hmm? I thought this was ALLOCATE-FULL-FRAME ensuring that there's some
space over the native frame for the special sbcl frame data. So this
would dissapear if/when Alistair finishes the changes for using the
native x86 frames for everything.
> > ; 6CE: FF5009 CALL QWORD PTR [RAX+9]
> > Would be nice if it wasn't an indirect call. It seems possible that
> > code components could directly call target code. Hypothesizing an
> > implementation: (setf (symbol-function 'foo)) would mutate the first
> > instruction (which starts out as a nop) in the target to be a jump to
> > the new code (and GC would notice and fixup that forwarding sometime
> > later). (symbol-function 'foo) would need to point to the instruction
> > after the jump-or-nop so as to not be affected by above mutation.
> You'd need several nops at the beginning of the function for
> overwriting--on x86, you'd need five--
There are multi-byte instruction sequences on x86 and x86-64 with no
effects, which the optimization guides from AMD and Intel recommend
for using instead of multiple nops. Though I don't think there's a 13
byte nop, which would be needed for the x86-64 absolute call fixup :-)
> and you'd need to teach the
> garbage collector about these new absolute addresses you're
> introducing into the code.
But luckily the gc already has a very similar concept for the relative
> Direct calls on x86-64, in the general
> case, would be pretty expensive: a 64-bit MOV followed by a
> JMP-to-register. It's not obvious that this would be any more
> efficient. It's possible I am misinterpreting what your suggested
> implementation is, though.
Right. This would pretty much require having a separate code object
heap on x86-64, allocated in low memory.