JBurg Wiki

Brought to you by: tharwood

SecondCodeGenerator

Authors:

Second Example Code Generator

Second Example Code Generator

The second example code generator is an evolved version of the first code generator. We will review the new items.

String Literals and Varidaic, Overloaded `print`

This compiler's print statement can print a sequence of integer expressions and string literals:

print "x = ", x, " and y = ", y, "\n";

In the front end grammar, this is simple: the print statement rule accepts a sequence of expressions, and the primary expression rule accepts a STRING_LITERAL token. The front end achieves this simplicity by ignoring the different type of a STRING_LITERAL and an integer expression.

The code generator, of course, cannot ignore this information. The hierarchy of rules for String expressions is similar to integer: stringConstant can be converted into stringExpression. stringConstant simply yields a STRING_LITERAL's text as a String at compile time. The reduction from stringConstant to stringExpression is a little more complex, since the actual string literal must be declared in the data section:

stringExpression = stringConstant: 1
{
    String name = String.format("__literal_x%h", this.dataSegmentItems.size());
    this.dataSegmentItems.add(
        templateManager.getTemplate(
            "declareStringConstant",
            "name", name,
            "initializer", stringConstant
        )
    );

    return allocateTemp(templateManager.getTemplate("la", "lvalue", name));
}

Overloaded Rules with Variadic Subtrees

The print statement is now overloaded and variadic: it accepts an arbitrary number of operands, which may be int or String expressions. A JBurg rule can be variadic in its final parameterized subtree; to "overload" the rule, define an overloaded nonterminal type and specify conversion rules from the normal expression nonterminals to the overloaded nonterminal:

statement = PRINT(printItem contents+): 1
{
    return templateManager.getTemplate("block", "contents", contents);
}

printItem = intExpression: 1
{
    return resetTemps(templateManager.getTemplate("printInteger", "expr", intExpression));
}

printItem = stringExpression: 1
{
    return resetTemps(templateManager.getTemplate("printString", "expr", stringExpression));
}

The type of the actual contents parameter, above, is Vector<Object>, which works well with StringTemplate container semantics: the print statement is simply a container for a sequence of QtSpim syscall operations. While this is a good design principle to follow, more complex variadic processing is often necessary, e.g., processing procedure arguments.

Constant Folding

Constant folding in a BURM is straightforward, but it illustrates an important point about instruction selection by dynamic programming (the "least costly alternative wins" principle):

intConstant = PLUS(intConstant lhs, intConstant rhs):1
{
    return lhs + rhs;
}

This rule looks very similar to the rule for adding two integer expressions:

intExpression = PLUS(intExpression lhs, intExpression rhs):1
{
    return allocateTemp(templateManager.getTemplate("add", "lhs", lhs, "rhs", rhs));
}

So why does the code generator prefer to do constant arithmetic? The answer lies in the rule that converts an intConstant into an intExpression:

intExpression = intConstant: 1
{
    return allocateTemp(templateManager.getTemplate("li", "expr", intConstant));
}

This rule has a non-zero cost, so the costs of applying the intConstant-oriented rule to 1 + 1 is 3: 1 for each integer constant subtree, and 1 for the ADD. The cost of applying the intExpression-oriented rule, on the other hand, is 5: 1+1 to convert each subtree to intExpression and 1 for the add. The BURM therefore prefers the intConstant-oriented rule.

Add Immediate

The compiler can select an addi instruction to implement expressions such as x + 1 (and x - 1), but it must first ensure that the constant is in the signed 16-bit range of a MIPS addi immediate operand. It does this by calling a decision function to compute the rule's cost; the function returns 1 if the immediate operand is in range, and Integer.MAX_VALUE (i.e., unfeasible) if not. As with the constant folding example, above, the cost to convert from intConstant to intExpression causes the the BURM to prefer to reduce a subtree with an appropriately scaled intConstant using addi.

// Prefer to use addi to add integer constants in range.
intExpression = PLUS(intExpression lhs, intConstant rhs): constantInRange(getNode().getChild(1), PLUS)
{
    return allocateTemp(templateManager.getTemplate("addi", "lhs", lhs, "rhs", rhs));
}

Parameters to `constantInRange`

The first parameter to constantInRange is the AST that represents the integer constant.
Q: Why only the AST node, and not the constant itself?
A: Because the BURM runs in a strict two-pass mode; the first pass finds the best combination of reductions, and the second pass performs the reductions. In the case of a leaf like INT_LITERAL, the BURM could perform both the labeling and reduction in a single pass, but this enhancement has not yet been implemented.

The second parameter is just a flag that tells constantInRange that the operation is PLUS. constantInRange needs to know this because the range for a 2's complement signed integer is not symmetrical, so the acceptable values for the immediate field differ: x - 32768 is feasible using addi, but x + 32768 is not.

Control Flow

The control-flow instructions illustrate two interesting features:

The synthetic 'boolean' type and the corresponding nonterminal.
Parmeterized sub-subtrees, where the rule's pattern-matcher looks deeper into the structure of the subtree.

The 'boolean' type

Our simple "X" language doesn't have an explicit boolean type, but the result of a comparison expression is implicitly boolean. The language's control-flow instructions work on values of that implicit type.

Parameterized Sub-Subtrees

MIPS has some specialized comparison instructions that make it worthwhile to write special-case rules for x == y type comparisons, among others (the rest are left as an exercise). We could define a special nonterminal for this, a booleanEqualEqual and go through some logic to package up the operands and then unpack them, but JBurg's pattern matcher can go more than one level deep into the subtrees before it starts parameterizing:

statement = IF(EQUAL_EQUAL(intExpression lhs, intExpression rhs), statement consequent): 1

This allows the rule to work directly with the applicable nodes.

Wiki: FirstCodeGenerator