On 26-Aug-09, at 5:06 PM, Christophe Rhodes wrote:
> Something that I noticed when looking at the xc leak in %unary-
> there's something odd about the code generation for inline
> constants, or
> possibly the *elsewhere* segment. On x86-64, on my systems,
> disassembling sb-kernel:%unary-truncate/double-float shows that, after
> the error trapping code (invalid-arg-count-error,
> object-not-double-float-error) there are 64 nops before what I think
> 4 bytes of zeros for alignment and then two lots of 8 bytes for the
> double floats. Is there an obvious reason for this?
I should have documented this better. There are two cases here:
If we optimize for speed > space: The goal is to make sure data and
instructions fall in different cache lines, thus the 64 NOPs. Split
L1I and L1D (found in nearly all x86 uarch with caches) don't react
well to executing and loading from the same cache line (at least we're
not writing to the inline constant pool).
Otherwise: IIRC, some uarchs' instruction decoder can suffer from
reduced throughput if they look ahead and hit illegal instructions.
The 16 bytes of NOPs are there to avoid that scenario. Any padding
size would be correct for strict correctness, but I was afraid of
performance regression in normal code without that small amount.