On 14 Oct 2007 22:24:30 +0300, Juho Snellman <jsnell@...> wrote:
> > > ; 6B7: 488BD4 MOV RDX, RSP ; 6BA: 4883EC18 SUB RSP, 24 why is RSP
> > > decremented here instead of on entry to this function (or not at
> > > all)?
> > Because of the manner in which function calls are expanded from IR1 to
> > IR2 and the lack of loop invariant code motion in IR2.
> Hmm? I thought this was ALLOCATE-FULL-FRAME ensuring that there's some
> space over the native frame for the special sbcl frame data. So this
> would dissapear if/when Alistair finishes the changes for using the
> native x86 frames for everything.
Right. What I was driving at is that we do this stack frame increment
for every function call--we don't even try to commonize it over
several function calls in a row and/or hoist it when possible (out of
loops or to the beginning of a function). Alistair's work on using
native x86 frames will certainly enable things like this to be done.
(I don't know if it will be automatic--certainly we will need to
compute the maximum stack size needed by function calls to allocate
everything up front?)
> > You'd need several nops at the beginning of the function for
> > overwriting--on x86, you'd need five--
> There are multi-byte instruction sequences on x86 and x86-64 with no
> effects, which the optimization guides from AMD and Intel recommend
> for using instead of multiple nops. Though I don't think there's a 13
> byte nop, which would be needed for the x86-64 absolute call fixup :-)
Duh, of course. I was thinking about architectural nops, not
effectively nopful instructions.
> > Direct calls on x86-64, in the general
> > case, would be pretty expensive: a 64-bit MOV followed by a
> > JMP-to-register. It's not obvious that this would be any more
> > efficient. It's possible I am misinterpreting what your suggested
> > implementation is, though.
> Right. This would pretty much require having a separate code object
> heap on x86-64, allocated in low memory.
You wouldn't need everything to be in low memory, would you? It would
conflict with where executables get loaded. Plus, since CALL takes
32-bit relative offsets, all you really need is for code to be in a
contiguous 2GB heap--doesn't matter much where this heap lives. Am I
missing something again?