|
From: Kevin K. <kev...@gm...> - 2017-05-15 00:01:53
|
Today, I finished up a little piece of code that I had lying around
for optimizing 'moveFromCallFrame' and 'moveToCallFrame' instructions.
What it tries to do:
(1) When moving a value from a callframe, it checks for the case:
(a) the callframe resulted from an earlier 'invoke'
(b) the invoked command cannot have modified the variable
(c) the callframe that was input to the 'invoke' was primed with a
'moveToCallframe' for the same variable.
In this case, it eliminates the 'moveFromCallframe' operation, and
replaces the value that would have been moved from the callframe
with the one that was moved to the callframe. This is a unique
reaching definition, and is known to be available at the given
point in the code, so this transformation cannot damage SSA.
(2) When moving a value to a callframe, it checks for the case:
(a) the value emerged from an earlier 'moveFromCallFrame' operation
(b) the input callframe to the 'moveToCallFrame' is the output
callframe of the 'moveFromCallframe'.
In this case, we know that the value is already in the target
frame because we just took it from there. We eliminate the value
from the list of values to be moved in the 'moveToCallFrame'.
(3) When moving a value to a callframe, it checks for the case
(a) the callframe is about to be consumed by an 'invoke'
operation.
(b) the invoked command does not use the variable.
In this case, it is safe to remove the value from the list of values
in the 'moveToCallFrame'.
(4) If the above optimizations result in a 'moveToCallFrame' instruction
with an empty set of values to move, eliminate the 'moveToCallFrame'.
(5) The above optimizations will often result in 'moveFromCallFrame'
instructions whose results are unused because they extracted
values whose only use was in eliminated 'moveToCallFrame'
instructions. They may be eliminated as well.
It is possible for these optimizations to interact and get in the way
of one another. The order in which they are executed is postorder on
the dominator tree (that is, dominated blocks are examined before
their dominators) and in reverse order in the instructions within a
basic block. This ensures that:
(A) The 'moveFromCallFrame' optimization will run when the
'moveToCallFrame' operations are still present, so that the unique
reaching definition can be found when eliminating the move.
(B) The 'moveToCallFrame' optimization will run when any earlier,
dominating use of the call frame still has its 'moveFromCallFrame'
operations in place, maximizing the likelihood of being to
eliminate a move based on rule (2) above.
(C) Elimination of dead 'moveFromCallFrame' calls happens only after
all the other optimizations have run, so that premature removal
cannot keep other callframe optimizations from being discovered.
In an effort to keep sandboxes merging in an orderly fashion, I
committed this logic onto a new branch, 'kbk-callframe-motion'. I
notice that adding it successfully eliminates the data motion in
'lsorttest', adding it to the very small set tests that succeed in the
callframe implementation. The code doesn't appear to break anything
else, so I expect that it should be merged back into
'dkf-callframe-impl' and the leaf closed. (Donal, take this as
permission and encouragement to do so!)
I also notice that, because of the cost of callframes, [lsorttest] is
no faster when compiled than when interpreted. This is fairly
unsurprising at this stage, since the compiled code for [lsorttest]
needs to do practically everything that the interpreter would have
done. The fix for this will have to be that we detect that [lsorttest]
doesn't use its callframe at all, and don't bother to push one. This
should be feasible, but we'll need to discuss further how to represent
the case for the code issuer.
Kevin
|