Here's a summary of my notes/thoughts on targeting by coalescing. I've
also added what I had on structured graph before I forget. (I'm CCing
-devel now that your work has gained contributors/testers.)
1. Targeting by explicit coalescing
We need a new vertex type, a meta-vertex if you will. It comprises
multiple vertices, inherits the intersection of their domain and the
union of their neighbourhoods.
At first, I thought we could grow these meta vertices as equivalence
classes. That's wrong, because interference edges may not fall the way
Instead, I think we should collect all the targeting desiderata and sort
them by importance (loop depth and perhaps read/write/both). Then,
traverse that sorted sequence and merge vertices/meta-vertices if
possible. At the end, we have a greedily coalesced interference graph.
Also, don't forget to coalesce with pre-allocated TNs, they're still
important. I'm not sure what should happen to spill costs; it's not a
real spill cost, but it strongly affects the rest of the code. I'm going
to guess taking the max spill cost isn't a bad idea… the sum might be a
worthy contender, especially given the problem outlined below.
We'll also have to update adjacency sets. I think it'll be simplest to
outright remove each meta-vertex's constituents from the graph and only
insert the meta-vertex. The sorted vertex list can be updated with cl:merge.
Problem: we'd really like to get targeting right to avoid moves to TNs
wired in place. That's a fairly common case: the wired TN represents
architectural constraints (rax/rdx in particular) or argument/result
passing conventions. But, in such cases, the domain for the TN is
exactly a single location, and the meta-vertex be pushed on the
pre-spilling stack unless all its neighbours have a lower spill cost
(have been assigned a stack already) :'( So, we'll probably want to work
on that criterion even more now… or just make sure meta-vertices tend to
be given a very high priority by summing their constituent spill costs.
The other common case is targeting to a TN restricted to a certain
storage class. That's not really a problem now that restricted TNs are
coloured normally, but with extra-high priority.
How are meta-vertices treated when choosing spill candidates? Initially,
I'd try to just let the current selection logic go normally; this
particularly makes sense if meta-vertices inherit the maximum spill cost
from their constituents. Especially if they instead take the sum of the
spill costs, it may also make sense to first try to "spill" (i.e.,
split) meta-vertices. That might call for a sorted meta-vertex list in
the interference graph.
When a meta vertex is selected for spilling, it can be split into all
its constituents (if we had extra information about the reason for the
spill, splitting into 2/3 meta-vertices might be interesting). The
incremental graph update logic shouldn't be too hard to extend: remove
the meta-vertex and re-insert its constituents, then merge the vertex
list back in sorted order.
2. Real-world interference graphs are structured
Folklore says that the quasi-totality of interference graphs in the wild
are perfect. I vaguely remember reading that for chordal graphs as well
(definitely true for in-SSA regalloc, but that's… different). I think
there's a stronger theorem for structured programs (and coarse live
ranges). If it turns out to be false for SBCL, that'll also be an
interesting negative result ;)
Apart from supposing that a greedy allocator will work decently well
once an elimination order is chosen, we don't really exploit this structure.
There's also the issue that perfect graphs are hard (currently O(n^8) or
some other crazy "polynomial" time complexity [but there's hope as this
is related to a recent breakthrough]) to recognize. Chordal graphs,
however, are much easier: we can recognize them, compute an optimal
elimination order, and enumerate maximal cliques in something like
linear time. On the other hand, I don't think chordal graphs are
hereditary, to there might be issues with the preprocessing of wired TNs.
The elimination order doesn't seem that useful to me: we have to handle
TNs wired to specific locations, different storage classes mapping to
the same storage base, etc. However, enumerating cliques does seem
practical (particularly given that I'm pretty sure the enumeration
algorithm also works heuristically on non-chordal graphs). Given a
clique, it is practical to detect when something in that clique *must*
be spilled; instead of only spilling a single TN per iteration, we could
instead take care of a lot of cliques in a single pass.
provide a solid overview for algorithms on subclasses of perfect graphs.
A simpler hack in the same vein would be to partitions TNs by storage
base before allocation. For example, float and integer registers are
completely independent on most architectures, so they can be spilled and
allocated in two steps. That ought to speedup the overall super-linear
For that angle, I think the first step would be to write something to
detect chordal graphs. Then, either download a lot of code, e.g. all of
quicklisp, and dump statistics (I'm told a repository of interference
graphs would be independently interesting, FWIW), or try and see if some
form of clique-based spilling can help on a few test cases, e.g.,
bootstrap and cl-bench.