From: Ian R. <ian...@ma...> - 2006-01-30 17:34:20
|
Hi, the following BURS rule seems neat: # Are shifts being used to create a mask of lower bits? szpr: INT_SHL_ACC(INT_SHR_ACC(r, INT_CONSTANT), INT_CONSTANT) (VR(p) == VLR(p)) ? 13 : INFINITE EMIT_INSTRUCTION EMIT(MIR_BinaryAcc.mutate(P(p), IA32_AND, BinaryAcc.getResult(P(p)), new OPT_IntConstantOperand(0xffffffff << VR(p)))); however it won't work for the following test: 34 int_shr t32si(I) = t11si(I), 12 37 int_shl t33si(I) = t32si(I), 12 the reason being that the OPT_ConvertALUOperators phase. This phase first turns the instruction at address 37 into: 37 int_shl_acc t32i(I) <-- 12 and renames later uses of t33 to t32. The instrucution at address 32 is then processed. It fits the last case of the following comment taken from OPT_ConvertALUOperators: * We also keep our eyes open for the special (somewhat common) case * of one of the uses being the last use of a temporary. When this happens * we can sometimes avoid inserting a move at all. When this happens, we * rewrite: * * op r100 = r200, r300 =====> op r200 <-- r300 * and replace all uses of r100 with r200. * * We aren't doing a full live analysis, but the following conditions * covers the cases where it is critical to get this right: * (a) r100 is ssa * (b) r100 does not span a basic block * (c) r200 does not span a basic block * (d) this instruction is the last use of r200 * (e) r200 is ssa * These conditions are designed to be cheap to verify and * cover those cases where it is advantegous from BURS's perspective to * coalesce the registers to avoid the move instruction. * * If we are in the following very similar case: * op r100 = r200, r300 =====> op r200 <-- r300 * move r100 = r200 * (1) r200 does not span a basic block * (2) this instruction is the last use of r200 * then we want the move instruction here (but after op), because * merging registers r100 and r200 would force BURS to break its * exprssion tree _before_ op since r200 would now span a basic block * (since r100 spans a basic block). * We depend on the register allocator to later coalesce r100 and r200, * since they are not simultaneously live. Now, my test doesn't have the case where r100 spans a basic block (the program has just 1 basic block). So it seems to me the code (at line 338 in OPT_ConvertALUOperators.java) doesn't agree with the comment either as the move after case is part of an if-then where "!rop1.register.spansBasicBlock()" ie r100 can't span a basic block. So is this a bug in OPT_ConvertALUOperators? The code I hope to generate in OPT_ConvertALUOperators is: 34 int_move t33i(I) = t11si(I) 34 int_shr_acc t33si(I) <-- 12 37 int_shl_acc t33i(I) <-- 12 and I believe this is what the comments imply should be generated. Thanks, Ian |
From: Ian R. <ian...@ma...> - 2006-01-30 17:42:00
|
Ian Rogers wrote: > <snip> > Now, my test doesn't have the case where r100 spans a basic block (the > program has just 1 basic block). So it seems to me the code (at line > 338 in OPT_ConvertALUOperators.java) doesn't agree with the comment > either as the move after case is part of an if-then where > "!rop1.register.spansBasicBlock()" ie r100 can't span a basic block. > So is this a bug in OPT_ConvertALUOperators? Sorry what I've written here is incorrect r100 is result and rop1 is r200. What I think needs to happen is the else on line 338 should instead be: else if(result.register.spansBasicBlock()) as I think we're catching cases with this "optimization" where the real cause isn't BB spanning but instead the SSA status of a register. In my case the registers are all SSA initially but trying to reduce the number of moves stops this. Ian |
From: Ian R. <ian...@ma...> - 2006-01-31 15:39:37
Attachments:
OPT_ConvertALUOperators.java
|
Hi, I've made some progress with things. I understand that we need to do convert ALU operators prior to BURS as otherwise the BURS rules would be on some pseudo Intel 3 address instruction set. The draw back to this has been that moves need injecting and these have various properties when going through BURS. Its a shame the BURS can't just ignore move edges (when moving a local temporary to another). I think the "optimize" part of the convert ALU operators may not always be optimizing, as it's quite hit and miss what reductions are in the BURS when some of the operands may have had a move injected. Anyway, my questions are: 1) why "(e) r200 is ssa" is this necessary (lines 248, 273, 330, 402)? if this is the last use of r200 and we know r200 is only defined in this basic block, why do we need to ensure its also SSA? ie op r200 = r1 op r200 = r200, r2 op r100 = r200, r300 // (NB r100 is SSA) can become: op r200 = r1 op r200 = r200, r2 op r200 = r200, r300 (rename later uses of r100 to r200) 2) for the other "optimize" case the move is inserted after rather than before the operation if this will stop the register holding the result temporarily from having the property of spanning basic blocks. Currently, however, the move is also placed after if r100 or r200 isn't SSA. This doesn't seem to match the javadoc, what should be happening? 3) is it a big thing to make the BURS ignore moves on Intel? Another possible alternative is to have a bigger rules file and a richer set of non-terminals. I have rewritten the convert ALU operators the way that I think makes sense and attached it to this e-mail. However, if you try it then you'll find that it breaks things. So I'm being naive somewhere. Any pointers are appreciated. Thanks, Ian |