The second example code generator is an evolved version of the first code generator. We will review the new items.
print
This compiler's print
statement can print a sequence of integer expressions and string literals:
print "x = ", x, " and y = ", y, "\n";
In the front end grammar, this is simple: the print
statement rule accepts a sequence of expressions, and the primary
expression rule accepts a STRING_LITERAL token. The front end achieves this simplicity by ignoring the different type of a STRING_LITERAL and an integer expression.
The code generator, of course, cannot ignore this information. The hierarchy of rules for String expressions is similar to integer: stringConstant
can be converted into stringExpression
. stringConstant
simply yields a STRING_LITERAL
's text as a String at compile time. The reduction from stringConstant
to stringExpression
is a little more complex, since the actual string literal must be declared in the data
section:
stringExpression = stringConstant: 1 { String name = String.format("__literal_x%h", this.dataSegmentItems.size()); this.dataSegmentItems.add( templateManager.getTemplate( "declareStringConstant", "name", name, "initializer", stringConstant ) ); return allocateTemp(templateManager.getTemplate("la", "lvalue", name)); }
The print
statement is now overloaded and variadic: it accepts an arbitrary number of operands, which may be int or String expressions. A JBurg rule can be variadic in its final parameterized subtree; to "overload" the rule, define an overloaded nonterminal type and specify conversion rules from the normal expression nonterminals to the overloaded nonterminal:
statement = PRINT(printItem contents+): 1 { return templateManager.getTemplate("block", "contents", contents); } printItem = intExpression: 1 { return resetTemps(templateManager.getTemplate("printInteger", "expr", intExpression)); } printItem = stringExpression: 1 { return resetTemps(templateManager.getTemplate("printString", "expr", stringExpression)); }
The type of the actual contents
parameter, above, is Vector<Object>
, which works well with StringTemplate
container semantics: the print
statement is simply a container for a sequence of QtSpim
syscall operations. While this is a good design principle to follow, more complex variadic processing is often necessary, e.g., processing procedure arguments.
Constant folding in a BURM is straightforward, but it illustrates an important point about instruction selection by dynamic programming (the "least costly alternative wins" principle):
intConstant = PLUS(intConstant lhs, intConstant rhs):1 { return lhs + rhs; }
This rule looks very similar to the rule for adding two integer expressions:
intExpression = PLUS(intExpression lhs, intExpression rhs):1 { return allocateTemp(templateManager.getTemplate("add", "lhs", lhs, "rhs", rhs)); }
So why does the code generator prefer to do constant arithmetic? The answer lies in the rule that converts an intConstant
into an intExpression
:
intExpression = intConstant: 1 { return allocateTemp(templateManager.getTemplate("li", "expr", intConstant)); }
This rule has a non-zero cost, so the costs of applying the intConstant
-oriented rule to 1 + 1
is 3: 1 for each integer constant subtree, and 1 for the ADD
. The cost of applying the intExpression
-oriented rule, on the other hand, is 5: 1+1 to convert each subtree to intExpression
and 1 for the add. The BURM therefore prefers the intConstant
-oriented rule.
The compiler can select an addi
instruction to implement expressions such as x + 1
(and x - 1
), but it must first ensure that the constant is in the signed 16-bit range of a MIPS addi
immediate operand. It does this by calling a decision function to compute the rule's cost; the function returns 1 if the immediate operand is in range, and Integer.MAX_VALUE
(i.e., unfeasible) if not. As with the constant folding example, above, the cost to convert from intConstant
to intExpression
causes the the BURM to prefer to reduce a subtree with an appropriately scaled intConstant
using addi
.
// Prefer to use addi to add integer constants in range. intExpression = PLUS(intExpression lhs, intConstant rhs): constantInRange(getNode().getChild(1), PLUS) { return allocateTemp(templateManager.getTemplate("addi", "lhs", lhs, "rhs", rhs)); }
constantInRange
The first parameter to constantInRange
is the AST that represents the integer constant.
Q: Why only the AST node, and not the constant itself?
A: Because the BURM runs in a strict two-pass mode; the first pass finds the best combination of reductions, and the second pass performs the reductions. In the case of a leaf like INT_LITERAL
, the BURM could perform both the labeling and reduction in a single pass, but this enhancement has not yet been implemented.
The second parameter is just a flag that tells constantInRange
that the operation is PLUS. constantInRange
needs to know this because the range for a 2's complement signed integer is not symmetrical, so the acceptable values for the immediate field differ: x - 32768
is feasible using addi
, but x + 32768
is not.
The control-flow instructions illustrate two interesting features:
Our simple "X" language doesn't have an explicit boolean
type, but the result of a comparison expression is implicitly boolean. The language's control-flow instructions work on values of that implicit type.
MIPS has some specialized comparison instructions that make it worthwhile to write special-case rules for x == y
type comparisons, among others (the rest are left as an exercise). We could define a special nonterminal for this, a booleanEqualEqual
and go through some logic to package up the operands and then unpack them, but JBurg's pattern matcher can go more than one level deep into the subtrees before it starts parameterizing:
statement = IF(EQUAL_EQUAL(intExpression lhs, intExpression rhs), statement consequent): 1
This allows the rule to work directly with the applicable nodes.