(file is sbcl-optimizer.diff)
This new version fixes a problem in the 64 bit where it wasn't matching all
of the possible patterns.
Pretty much, 64 bit emits a lot of labels that are not used for jumps or
anything, so during peephole optimization
this new version ignores the extra labels for the purpose of pattern
matching, and leaves them in place.
On both platforms, the single optimization that I have is matched in about
2000 occurrences during compilation.
This version compiles the whole source, runs all of the tests properly.
I also compiled Maxima and CL-PCCRE with it.
It passes all of the tests (that are expected to pass) for both.
On Thu, Sep 9, 2010 at 12:58 AM, Jonathan Smith
> So, I've got another version of the a peephole optimizer.
> As requested, I have just done a diff.
> It can be found here:
> The filename is 'sbcl-peep.diff'.
> This diff is between my version of 1.0.42 with peephole optimization, and
> the 1.0.42 version on the website.
> This particular version only takes care of the circular move redundancy on
> x86 and x86_64. It also doesn't do any flow or deadness analysis, so you
> have, for example:
> mov [ebp], ebx
> mov ebx, [ebp]
> getting replaced with:
> mov [ebp], ebx
> If you examine the .diff, I actually made very few changes to the original
> source code, The main points are adding a 'buffering' mode to instruction
> emission, as well as an 'values' field to scheduler instructions (which is
> really the arguments the instruction was called for emission with).
> Then we are able to capture instructions as 'instruction' structs (as used
> by the scheduler), labels as 'label' structs, (as well as alignments as
> functions...) in a list.
> Also we capture TNs/EAs within each created instruction (that the
> instruction was to be emitted with).
> Because TNs are packed at emission time, they map directly onto machine
> registers, and are susceptible to comparison via sb, sc, offset, etc.
> Because it is all captured, it becomes easy to peephole optimize the code.
> There is also a basic declarative macro for defining optimizations, called
> 'define-pattern', and a macro called 'define-peephole-optimizer'.
> Define-pattern is based on the arguments made to the instruction via the
> 'define-instruction' macro (you will notice that x86 optimization is
> different from the x86_64 definition, because 'mov' is defined
> differently...), the syntax is:
> (define-pattern ...pattern... -> ...replacement... !!
> It does a very basic unification, variables in the unification are defined
> by symbols prefixed with ?. If the function passes unification, it returns a
> unified version of the replacement pattern.
> The replacement pattern is then passed to the 'peephole-optimizer' lambda,
> (defined by 'define-peephole-optimizer', which is defined to include a
> variable length 'window'.
> There is a bug in the peephole optimizer itself right now, in that it does
> not take care of the very important case, where the replacement pattern is
> longer than the optimizer pattern.
> Anyway, it seems that I have met with some success with this project, and I
> welcome comments/thoughts on this.
> I will note that (using a very rudimentary counter) compiling the system
> with this optimization on x86 seems to catch a couple of thousand
> redundancies, and compiling on x86_64 catches a few hundred... I'm not sure
> why there is the apparent difference (perhaps because there are fewer
> registers and memory locations on x86!). Once there are more peephole
> optimizations working there might even be some performance gains.