From: Paul Khuong <pvk@pv...>  20131118 21:54:18

Hello! Here's a summary of my notes/thoughts on targeting by coalescing. I've also added what I had on structured graph before I forget. (I'm CCing devel now that your work has gained contributors/testers.) 1. Targeting by explicit coalescing We need a new vertex type, a metavertex if you will. It comprises multiple vertices, inherits the intersection of their domain and the union of their neighbourhoods. At first, I thought we could grow these meta vertices as equivalence classes. That's wrong, because interference edges may not fall the way we'd like. Instead, I think we should collect all the targeting desiderata and sort them by importance (loop depth and perhaps read/write/both). Then, traverse that sorted sequence and merge vertices/metavertices if possible. At the end, we have a greedily coalesced interference graph. Also, don't forget to coalesce with preallocated TNs, they're still important. I'm not sure what should happen to spill costs; it's not a real spill cost, but it strongly affects the rest of the code. I'm going to guess taking the max spill cost isn't a bad idea… the sum might be a worthy contender, especially given the problem outlined below. We'll also have to update adjacency sets. I think it'll be simplest to outright remove each metavertex's constituents from the graph and only insert the metavertex. The sorted vertex list can be updated with cl:merge. Problem: we'd really like to get targeting right to avoid moves to TNs wired in place. That's a fairly common case: the wired TN represents architectural constraints (rax/rdx in particular) or argument/result passing conventions. But, in such cases, the domain for the TN is exactly a single location, and the metavertex be pushed on the prespilling stack unless all its neighbours have a lower spill cost (have been assigned a stack already) :'( So, we'll probably want to work on that criterion even more now… or just make sure metavertices tend to be given a very high priority by summing their constituent spill costs. The other common case is targeting to a TN restricted to a certain storage class. That's not really a problem now that restricted TNs are coloured normally, but with extrahigh priority. Spilling: How are metavertices treated when choosing spill candidates? Initially, I'd try to just let the current selection logic go normally; this particularly makes sense if metavertices inherit the maximum spill cost from their constituents. Especially if they instead take the sum of the spill costs, it may also make sense to first try to "spill" (i.e., split) metavertices. That might call for a sorted metavertex list in the interference graph. When a meta vertex is selected for spilling, it can be split into all its constituents (if we had extra information about the reason for the spill, splitting into 2/3 metavertices might be interesting). The incremental graph update logic shouldn't be too hard to extend: remove the metavertex and reinsert its constituents, then merge the vertex list back in sorted order. 2. Realworld interference graphs are structured Folklore says that the quasitotality of interference graphs in the wild are perfect. I vaguely remember reading that for chordal graphs as well (definitely true for inSSA regalloc, but that's… different). I think there's a stronger theorem for structured programs (and coarse live ranges). If it turns out to be false for SBCL, that'll also be an interesting negative result ;) Apart from supposing that a greedy allocator will work decently well once an elimination order is chosen, we don't really exploit this structure. There's also the issue that perfect graphs are hard (currently O(n^8) or some other crazy "polynomial" time complexity [but there's hope as this is related to a recent breakthrough]) to recognize. Chordal graphs, however, are much easier: we can recognize them, compute an optimal elimination order, and enumerate maximal cliques in something like linear time. On the other hand, I don't think chordal graphs are hereditary, to there might be issues with the preprocessing of wired TNs. The elimination order doesn't seem that useful to me: we have to handle TNs wired to specific locations, different storage classes mapping to the same storage base, etc. However, enumerating cliques does seem practical (particularly given that I'm pretty sure the enumeration algorithm also works heuristically on nonchordal graphs). Given a clique, it is practical to detect when something in that clique *must* be spilled; instead of only spilling a single TN per iteration, we could instead take care of a lot of cliques in a single pass. I found <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.4312>; to provide a solid overview for algorithms on subclasses of perfect graphs. A simpler hack in the same vein would be to partitions TNs by storage base before allocation. For example, float and integer registers are completely independent on most architectures, so they can be spilled and allocated in two steps. That ought to speedup the overall superlinear algorithm. For that angle, I think the first step would be to write something to detect chordal graphs. Then, either download a lot of code, e.g. all of quicklisp, and dump statistics (I'm told a repository of interference graphs would be independently interesting, FWIW), or try and see if some form of cliquebased spilling can help on a few test cases, e.g., bootstrap and clbench. Paul 