|
From: Kevin K. <kev...@gm...> - 2019-01-02 02:15:42
|
Donal,
Happy 2019! Hope you're well. I've not been around on the mailing
lists or the chat the last couple of weeks - this has been a time for
spending with family and friends.
But in my odd moments, I have been looking at quadcode some. (In
particular, jump threading has a shiny new implementation that is many
times faster than the old, and in addition, generates a much smaller
explosion of the state space.) I've also done a new implementation of
Global Value Numbering/Partial Redundancy Elimination (GVN-PRE) that
subsumes common subexpression elimination and loop-invariant code
motion, and could (not yet implemented, but possible!) also include
constant folding (rather than having that in a separate pass),
expression simplification (things like 0+a=a) and reduction of
operator strength. Some of that I want to leave to LLVM, but I doubt
that it can do a very good job simplifying dictionary or list
operations!
The choice of these was guided by the issues surrounding poly1305. The
old jump threading was a huge drag on compile speed, and now that
particular issue is removed.
The next big issue in poly1305 is callframe management, so that's what
I've jumped into next. (I took a long false path into trying to
refactor varargs, and intend to get back to that, but it will get
easier once the callframe stuff is refactored.)
It's become fairly clear to me that the 'save to the callframe before
invoke, restore after' is the wrong approach. What's right is to do
what LLVM itself does: allocate slots in the callframe (analogous to
the 'alloca' operations that reserve variables in LLVM), and initially
generate code that loads ('moveFromCallFrame in quadcode notation) and
stores (moveToCallFrame) on every read and write of a variable. That
will start with all variables in sync, and I'm working on the
algorithms (very, very closely related to GVN-PRE) for determining
when it's safe to eliminate loads and stores.
The new algorithms will be fairly aggressive about moving code around
- so that, for instance, a read-only use of a namespace variable
within a loop can be hoisted out of the loop if nothing reads or
writes the variable or any alias. IN designing this, I realize that
the logic is about to founder on directSet, directGet, directArraySet,
directArrayGet, etc. - fourteen operations in all. All of these at
least notionally modify or access the state of variables, and so need
to have the callframe among their parameters and results.
All of them need to get the callframe as input, and the ones that
modify variables need to return {CALLFRAME FAIL STRING} - which will
immediately fall into the conventional sequence of extractCallFrame,
retrieveResult, jumpMaybe.
In the kbk-refactor-callframe branch, I've changed translate.tcl to
generate these operations (not really tested, I'm still working on the
rest of the code!) but I've not yet even started on making the
corresponding change on the codegen/ side. If you could possibly help
with that - I imagine that it's a pretty mechanical change - that'd be
a time-saver for me.
I imagine that it's a pretty mechanical change because at present the
CALLFRAME doesn't actually mean anything. It's a notional thing that
keeps me from reordering quadcode in a way that would hoist loads
above the corresponding stores, that sort of thing, but there is only
one CALLFRAME per proc, ever, so there doesn't necessarily need to be
anything in the CALLFRAME value to designate the callframe. (On the
other hand, I think I see that you have given an internal
representation to the CALLFRAME data type, so I'm not sure what's
going on there.)
There's no particular hurry with all of this. It'll take me a little
while to get my own stuff in order. If you'd rather wait until I have
something that you could test against, I'm fine with that, too.
As a side effect, when this subproject is done, we should get a
correct implementation of [set $a] and [set $a $b] - and a possibly
substantial performance gain in access to [upvar] and namespace
variables.
Beyond this, the next thing to think about will be breaking up direct*
into smaller pieces. If 'directLappend result callframe listvar
element' could turn into a sequence like
directRef listRef callframe var
refGet val listRef # don't know if I might
lappend val2 val element
refPut temp callframe listvar element
extractCallframe callframe2 temp
retrieveResult result temp
(plus whatever error checking goes there)
then I couid start looking at aggregating all the directRef's to the
same variable, hoisting them out of loops, and so on. Many of the
instructions in straight-line sequences like this are optimized to
zero-cost operations.
Kevin
|